A NETWORK AND SECURITY ANALYSIS OF THE
U.S. INTERNET BACKBONE NETWORK
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
TABLE OF CONTENTS
ABSTRACT. ............ ............................... iv
1 PUTTING THINGS INTO PERSPECTIVE ............ ............ 1
Introduction ........... .......... ......... ........ .... 1
Statement of the Problem and Hypotheses ................... .... ... 3
The Internet ................... ......... 4
2 NETWORK CONCEPTS: A LITERATURE REVIEW ...................... 16
Network Concepts and Constructs ................... ............. 16
Network Analysis .............. ............................... 23
Geographic Literature of Networks & Information ......................... 33
History of the Internet and Parallel Developments in Telecommunications ...... 42
Telecommunications and the City ............. ................. .... 45
Data and Methodology ........ .......................... ..... 50
3 NETWORK ANALYSIS ............ ............................. 55
Analysis ............... ....................... .... ... 55
Alpha Index ................ ................. .................. 59
Connectivity Matrix .................................................. 65
Weighted or Valued Graphs and Shortest-Path ................. .. ..... 73
Conclusion ................ ......... 77
4 THE U.S. INTERNET BACKBONE NETWORK: AN UNWEIGHTED ANALYSIS 81
Complete Matrix .. ..... .. .. ..... ... ...... 96
Node-Rem oval Scenarios ........................................... 102
Pair Removals ................ ................. ... .. ..... 124
Sum m ary ................................... ........ .. ..... .. 127
5 THE U.S. INTERNET BACKBONE NETWORK: A WEIGHTED ANALYSIS .... 131
Bandwidth: A n O verview ........................................ 132
Fully Connected W eighted Network ............................ 138
Conclusion ................... ... ... ............. .. .. .... 171
6 UNDERSTANDING AND APPLYING THE RESULTS OF THE U.S. INTERNET
BACKBONE NETWORK ANALYSIS .............................. 179
Regression Analysis .................................. ......... 179
Kriging ...................... ............ ..... .. ....... 187
Surface Results ............. ............................... 190
Expanded Regression Analysis ......... ................... ...... 192
Important Nodes in the Internet Backbone Network ............... .. ... 206
Summary and Conclusions ..... ................... .......... ... 208
Directions for Future Research ..................................... 213
REFERENCES ............................. .......... ..... ...... 216
BIOGRAPHICAL SKETCH ....................... ..... .... .. 226
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Doctor of Philosophy
A NETWORK AND SECURITY ANALYSIS OF THE
U.S. INTERNET BACKBONE NETWORK
Chairperson: Timothy J. Fik
Major Department: Geography
Currently, as both the public and private sectors have become increasingly reliant
on Internet-related infrastructure, it is essential that the most valuable components of the
telecommunications system be identified and protected from disruptions and sabotage,
to ensure the proper functioning of the nation's economy and communications networks.
Any disruption that might lead to the loss of a network component could have
devastating consequences for both the overall network and the economy at large. In
light of these concerns, our study focuses on U.S. telecommunication networks and the
Internet backbone. Our study highlights the importance of spatial variability and
discusses the potential susceptibility of network components. More specifically, our
study outlines a methodology for identifying critical nodes and links in the U.S. Internet
backbone network. This type of analysis will aid policy-makers in the allocation of
resources when determining which infrastructures are most important to protect or
duplicate, to minimize the threat of a disruption. Identifying critical components is
essential for prioritization schemes that may be developed to add redundancy and
circuits to the network, to ensure its proper functioning in the event of an attack or
disaster. A graph theoretic approach is used to define and rank nodes and links and to
measure their importance to the overall network using both weighted and unweighted
scenarios. Implications of various node- and linkage-removal scenarios are also
discussed. Empirical results suggests clustering of telecommunication infrastructure and
bandwidth within large metropolitan locations, with regional variations in connectivity that
are not simply a matter of population size. Understanding the Internet as a "network of
networks" will aid in protecting and preserving the network, lessen component
susceptibility to disruptions, and enhance its overall efficiency.
PUTTING THINGS INTO PERSPECTIVE
On January 25, 2003, the Sapphire worm was launched by malicious hackers, and
reached global diffusion in 10 minutes. In the process, several bank automated teller
machine (ATM) networks failed, airline reservation systems ground to a halt, and the
entire Internet suffered a global slow-down. Results of the Sapphire worm show not only
the interdependency of telecommunication networks, but remind us of their vulnerability.
Before this cyber-attack, events of September 11, 2001, brought a heightened sense of
awareness and increased focus on the security of critical infrastructure in the U.S. As
the world becomes increasingly connected, because of the Internet and advanced
telecommunications technology, more attention has turned to issues of cyber terrorism.
The increasing dependencies on the ever-expanding telecommunications and
information technology (IT) have brought new concerns over security and susceptibility
to attacks. In the current atmosphere of tense international relations, it is now imperative
that law, policy, regulation, and technology are more fully integrated in the field of
telecommunications. This will help to ensure the stability and security of the nation's
critical, sophisticated, and valuable information infrastructures. Understanding the
geography of this infrastructure is imperative: the nation's economic security is highly
reliant upon this vital resource to support expanding financial networks and information
This research explores the location and connective properties of the Internet and
advanced telecommunications networks, concentrating specifically on physical
infrastructure, and more specifically, on the Internet backbone network. Though the
current role of urban planners and local government is minimal in the building of
telecommunication infrastructure, decision makers in the telecommunications industry,
as well as urban policymakers at the national level, have recognized the importance
and need for policy and regulation.
Today's Internet is comprised of various infrastructures. The impact of
telecommunications infrastructure and the Internet on urban systems, businesses,
academia, government, and consumers has increased dramatically since it's
commercialization. Users have become more reliant on Internet infrastructure and
technology to carry out basic functions and communication activities. Subsequently, they
have become more vulnerable to network disruption. The first step in answering
questions of vulnerability and risk related to telecommunication networks is to
incorporate different types of telecommunication data into one common information
This research project includes various types of telecommunication and Internet
infrastructure data. The research also looks at present policy governing
telecommunication networks and infrastructure, as well as pending policy that will further
impact the industry. To aid in our understanding of this problem, it is imperative to
determine the geographical locations of those links and nodes that are most valuable to
the telecommunication industry- links and nodes that would have the greatest negative
impact should information flows through these assets become disrupted. The results of
the analysis will then be placed in the context of national and regional security.
Identifying the physical location and interdependencies of critical links and nodes will
allow for security emphasis to be redirected to those network components that are
currently most vulnerable to an attack, thereby creating a prioritization scheme for
creating greater redundancy to potentially minimize the impact of node or linkage
disruption. The methodology employed in this dissertation may be widely adopted time
and time again to assess this network.
Statement of the Problem and Hypotheses
This research was intended to explore telecommunication network connectivity and
vulnerability. Currently, one of the most pressing issues of homeland security in the U.S.
is the protection of critical infrastructure. In response to growth concerns over domestic
terrorism, this research focuses on the telecommunication network and geographic
variations in its susceptibility to an attack. Extra insurance in the form of protection and
prevention is required to preserve the most-valuable links and nodes of this network.
And hence, the locational aspects of the most valuable components must be identified.
Any disruption -rai .rgrit lead to the loss of a node or link in a network could lead to
devastating consequences upon the overall network. Determining the most critical links
and nodes will allow for increased protection of the infrastructure.
Preliminary research on this project indicates that highly connected nodes will be
the most critical to the overall network. It is hypothesized that the most important nodes
will house the most bandwidth. It is also hypothesized that the most highly connected
nodes presently house multiple interconnection facilities called colocation facilities (a
more detailed description of these facilities can be found in Chapter 2); the demand for
interconnection increases positively in direct proportion to fiber bandwidth. Although
these nodes may be the most directly connected in the network, they are not necessarily
at the top of the nodal ranking in terms of both direct and indirect connectivity. It is
further hypothesized that the most critical links will be connected to the most critical
This dissertation also will identify critical clusters of Internet activity on the east and
west coast of the U S. and other regional subnetworks or clusters. It is hypothesized that
the most highly connected places contain the largest amount of bandwidth and highest
number of connections to other places, thus housing the most infrastructure. There is,
however, a definite variability in the prominence of links and nodes, as well as a varying
degree of vulnerability. The complexity of a network multiplies as the number of links
and nodes comprising the network increases. As the complexity and size of the network
may vary, so might the vulnerability of each link and node within the overall network.
Links and nodes that are more critical to the overall network will have a greater impact
on the network should they become disconnected from the network. Furthermore, it is
hypothesized that the overall impact of the removal of a node or linkage may not be
obvious without a more in-depth analysis that considers all direct and indirect
connections. Connectivity indices will be used to determine the most critical links and
nodes in the network It is hypothesized that places that are less prominent to the overall
network, but more highly connected in a regional sense may be more vulnerable to an
attack and potentially more disruptive.
And finally, it is hypothesized that there will be great variability in the overall effect
of disruption or termination of flows through various nodes and linkages with regional
implications that are not obvious. When a link or node is damaged or disrupted, the
connective properties of nodes and links may change completely. By applying graph
theory to analyze the network, the structure and properties of the network will be
highlighted and components tested. This will allow for the identification of circuits and
redundancy that would increase the overall connectivity of the network and minimize the
impact of disruption.
History of the Internet
The Advanced Research Project Agency network (ARPANET) was created in the
late 1960s as a network project of the Advanced Research Projects Administration
(ARPA) under the direction of the Defense Department. The Internet is a byproduct of
that project, and though appearing as a recent phenomenon, it is actually a
representation of decades of development (Abbate 1999). By the mid-1980s the "Net,"
as ARPANET was nicknamed, had transitioned into the hands of the National Science
Foundation (NSF). The NSF allowed the Net to be used strictly for academic and
research purposes (Boardwatch 1999). However. private firms and corporate developers
of early Internet technology such as: Bolt, Baranek, and Newman (BBN)-a military
contractor and consulting firm involved in the early engineering and development of the
project-realized the economic potential of this research network (Schiller 1999). Firms
developed networks and collaborated with one another by interconnecting these new
networks to create a private-sector version of NSFNET for their corporate clients,
essentially duplicating the NSF's Net. The Internet switched from an academically
oriented network to a commercially oriented one in 1991; when the NSF decided to allow
commercial traffic across NSFNET (Thomas & Wyatt 1999)
Today the Internet is heavily used for e-commerce: shopping, trading stocks,
pornography, music downloading and file sharing and real estate, as well as a valuable
information resource tool. The Internet has become an important forum for news media.
Major broadcasting companies have websites with audio and video clips updated around
the clock, and newspapers have many sections of their printed version posted on
websites. The Internet has also radically enhanced personal communication via: e-mail,
chat rooms,' instant messaging.2 The Internet has evolved from a single experimental
network serving a dozen sites in the United States to a "network of networks" linking
millions of computers and servers worldwide (Abbate 1999).
'A chatroom is a place or page in a Web site or online service where people can
"chat" with each other by typing messages that are displayed almost instantly on the
screens of others who are in the "chat room." Chat rooms are also called "online
'Instant messaging is a service that alerts users when friends or colleagues are on
line and allows them to communicate with each other in real time through private online
Infrastructure and the Net
The network of networks is comprised of links and nodes, a global network
connecting millions of computers worldwide to exchange data and news. There are
several levels of networks at work. Interconnection of these networks is key to the
functionality of the overall network. The Internet backbones are the long-haul routes that
link the nodes of the Internet to users.
The Core of the Global Net Is Centered within the U.s.
Although telecommunications began in America with the introduction and diffusion
Samuel Morse's telegraph, the U.S has not always been the world leader of
telecommunications networks. Great Britain developed superior telecommunication
technologies and communication networks early on in the 1870s (Hugill 1999). Abler
(1991) attributes the development of America's telecommunications industry to system-
specific software and a series of historic accidents, such as the failure of the U.S.
Congress in 1845 to see any value in its ownership of the patent on the telegraph.
During the formative stages of the Internet's development, the United States established
itself as a world leader in implementing the technologies and infrastructure needed to
develop and nurture this innovation. This would explain why, on a global scale, the U.S.
became the center of Internet activity (Cukier 1998, Finnie 1998, Malecki & Gorman
Cukier (1998) gives several reasons why the Internet is "U.S.-centric": (1) It had a
"head-start in building infrastructure and guiding the location of Internet content; (2) the
artificially high cost of cross-border capacity outside the U.S.; and (3) and customer
demand for Internet service" (p. 113). Dodge and Kitchin (2001) describe the distribution
of Internet users, stating that in February of 2000, the U.S. and Canada accounted for
136.06 million users with approximately 5% of the world's population; while Asia and
Europe combined accounted for 126.89 million users despite accounting for well over
60% of the world's population. Internet traffic patterns indicate that over half of European
Internet traffic and 70% of Asian Internet traffic travels through the United States. In
1999, the United States housed 58% of Internet hosts and content, and only 6% of the
100 most visited web sites were located outside of the U.S. (Cukier 1998). Although the
U.S. has a significant lead in telecommunications infrastructure, it is likely European
telecommunication growth will soon boost Europe to rival the U.S.
Major metro areas in the U.S. and the telecommunication companies that provide
Internet infrastructure within them are in a fierce competition for top ranking. Based on
Internet activity and infrastructure, these cities are in a constant shift of rank, depending
on what type of infrastructure is being measured. However, the same seven metro areas
continuously make the list: namely, New York, Washington, D.C., San Francisco,
Chicago, Dallas, Los Angeles, and Atlanta (Cukier 1998, Finnie 1998, Graham 1999,
Malecki & Gorman 2001, Moss & Townsend 2000). New York is the leading Internet hub
in the global economy, housing nine fiber networks, more than any other metropolitan
area (Finnie 1998).
The metropolitan rankings compiled by Atkinson and Gottlieb (2001) do not match
rankings that measure strictly telecommunication infrastructure because their research
includes other economic variables. The Metropolitan New Economy Index (Atkinson &
Gottlieb 2001) has compiled a ranking of U.S. metro areas based on five subcategories:
knowledge jobs, degree of globalization, economic dynamism and competition, the
transformation to a digital economy (including infrastructure), and technological
Internet Links-Backbones and ISPs
The term "Internet service provider" (ISP) is an overgeneralization that combines
both small, local Internet service providers and globe-spanning Internet backbones, and
the term does not differentiate among the various types of ISPs. An ISP is a company
that provides access to the Internet, serving individuals as well as large companies, with
direct links. ISPs can play vastly different roles. ISPs can be part of a major backbone
and a global network, or they may be local providers leasing infrastructure and servicing
a limited geographic market.
An Internet "backbone" simply can be defined as a collection of wires that connect
the Internet's nodes, linking them together so that they may exchange data. In more
complex terms, the Internet backbone is defined by the National Telecommunications
and Information Administration (NTIA) as "a set of paths that local area networks (LANs)
connect for long-distance connection. A backbone employs the highest-speed
transmission paths in the network. A backbone can span a large geographic area. The
connection points are known as network nodes or telecommunication data switching
exchanges (DSEs)" (Telegeography 2001, p. 102). International backbones were
defined by Telegeography (2001, p. 102) as "Private data links which cross international
political borders, run the Internet Protocol (IP), are reachable from other parts of the
Internet and carry general Internet traffic: e-mail web pages, and most of the other
popular services which have come to define today's Internet" (Telegeography 2001, p.
102). A backbone firm, such as Sprint, may serve as an ISP itself, or an ISP may lease
access to the Internet from a backbone provider. At the same time, many different ISPs
utilize the same fiber. This means that different communications links, even when
obtained from different providers, many run over the same fiber, in the same bundle, or
in the same conduit (NRC 2001).
Cukier (1998) classifies Internet Service Providers (ISP) into four groups: (1)
backbone ISPs; (2) downstream ISPs; (3) online service providers (e.g., American
Online, Microsoft Network); and (4) firms specializing in Website hosting (e.g., Qwest)
(Table 2-1). Backbone ISPs include those that may connect the Internet globally and
transfer the largest amounts of data, such as Exodus, Globix, Sprint, MCI WorldCom,
Table 2-1. Internet service provider groups
ISP Group Example
Level 1: National backbone ISPs SprintLink, MCI
Level 2: Downstream ISP AOL TimeWarner, WorldCom
Level 3: Online Service Provider America Online, Microsoft Network (MSN)
Level 4: Website Hosting Qwest
According to Malecki and Gorman (2001), 48 national backbone operators
currently operate the transit network in the U.S. The next level is comprised of the
downstream ISPs, hundreds of local and regional ISPs that serve mainly individuals and
small and medium businesses. Malecki & Gorman (2001, in Brunn & Leinbach 1991, p
91) divided the Internet hierarchy into five levels (Table 2-2). All networks exchange data
on the first level. The second level makes the transfer of data possible among cities
around the world. Regional networks comprise the third level, though Malecki and
Gorman warn, "they [regional network providers] may be a dying breed as they are
replaced by national providers" (Malecki & Gorman 2001, p. 92). Internet Service
Providers (ISPs) are the fourth level. Finally, Internet users are the fifth level.
Table 2-2. The hierarchy of Internet network interconnections
Level Providers Example
Level 1: Interconnection Network Access Points Ameritech Chicago
(NAPs), private peering points NAP
Level 2: National National backbone operators SprintLink, MCI
Level 3: Regional Regional Network Operators Erols, Rocky Mountain
networks Internet, Inc. (RMII)
Level 4: ISPs Internet Service Providers DialNet, bright.net
Level 5: Users Business and consumer
"Middle-Mile" Network Links
These are the links of the "aggregators," firms that connect large data customers,
such as firms in office buildings and office parks, to local points of presence (POPs) of
backbone networks. Many utilities such as Gainesville Regional Utilities (GRU) and
Florida Power and Light (FPL) serve as middle-mile providers in this service area.
Middle-mile facilities connect the backbone fiber owned by large national
telecommunications firms, such as Sprint or MCI WorldCom, to regional networks that
may be owned by utility companies, or smaller telecommunication firms. Middle-mile
facilities are an integral part of the Internet's backbone hierarchy, providing linkage
between national/regional networks and local networks. A recent report by the Federal
Communications Commission (FCC) (2000) includes middle-mile facilities as one of
three main types of the Internet's network components (backbone facilities and last-mile3
facilities comprise the other two). Thus far, little or no research has been done to
analyze this scale or segment of the Internet.
The Last-Mile Connection: From Backbone to Computer
The "last-mile" connection refers to the connection between the Internet backbone
and user; it is the final physical linkage between user and network. A user's connection
to the Internet may be one of three types: dial-up, continuous, or wireless. Continuous
connection is the most efficient connection, enabling instant delivery of e-mail,
elmirnraing the need to tie up a phone line, and allowing businesses to advertise and
publish directly to the Internet and enable "real-time energy management" (Hurley &
Keller 1999, p. 3). Most households currently access the Internet through dial-up
connections, using a modem and a telephone line. This method is the cheapest way to
access the Internet. Users may not have the option to use a faster, more expensive type
3"Last mile" is the term used to describe the connection between the user and the
ISP, which is typically the slowest aspect of Internet access.
of connection because access to sophisticated types of Internet infrastructure are not
offered in their neighborhood Integrated Services Digital Network (ISDN), developed in
the early 1980s to improve telephone service, is a technology introduced in the early
1990s that provides moderate bandwidth (64 to 128 Kbps4). This technology was
efficient for connecting to the Internet, though a slow and lengthy process. Soon after,
more technologies emerged: cable modems offering high-bandwidth and continuous
connection (10 Mbps' or 30 Mbps), Asymmetric Digital Subscriber Line (ADSL), a
-,,gr,.,n,,j.,1Th copper-wire technology (8.192 Mbps/640 Kbps up). These new
technologies boasted higher bandwidth, which allowed for a higher data transfer rate
and volume. The connections were dedicated to data transfer, allowing the user to
maintain a constant, uninterrupted transfer.
Physical distance matters with many types of sophisticated telecommunications
infrastructure, such as digital subscriber line (DSL). The closer one is to the service
provider, the better the service (Moss 1998). With a full range of options for providing
high-bandwidth local access, it is clear that no single technology will be declared the "all-
around winner" (Hurley & Keller 1999, p. 37). -.gr,-t.r,..,dirn technologies remain
competitive, with user preference between cable modems and DSL not apparent.
Wireless is the newest form of access to the Internet. The development and deployment
of wireless service to provide mobile access hinges on the results of current FCC efforts
to open up radio-frequency spectrum for such services (NRC 2001). Currently, a
wireless connection to the Internet is not considered secure and may be decrypted by
hackers quite easily. This lack of security greatly deters wireless users from making
financial transactions or transferring valuable data via a wireless connection. Whichever
4 Kbps- kilobits per second (thousands of bits per second). Kbps is a measure of
bandwidth (the amount of data that can flow in a given time) on a data transmission medium.
5 Mbps-megabits per second (millions of bits per second) Mbps is a measure of
bandwidth on a transmission medium.
technology is used to connect to the Internet, the user is able to navigate the Internet,
check e-mail, and communicate with other users. As data are requested, users may
navigate from one network to the next to reach the data source, call up the data packets,
and navigate back through the Internet to complete the request.
The original nodes of the Internet were universities and research institutions invited
by the Department of Defense to participate in the ARPANET project; they included
prestigious research institutions such as MIT, Carnegie Mellon, and UCLA, to name a
few. These universities were given large grants in 1965 to create "centers of excellence"
computing research centers. These research centers, da5E:,re.l Trrougr,.:.ul nme United
States, were connected to form ARPANET. It was the original users that transformed
ARPANET into the Internet we know today. The users of the network could create new
applications with few restrictions and had the incentive and ability to experiment with the
Internet to mold it to better meet their immediate needs, for example, building new
hardware or software, or using the existing infrastructure in new, improved ways (Abbate
Large businesses, universities, and larger institutions often have direct links that
allow the user to bypass the telephone network and connect directly to a metropolitan-
based network or to Internet backbones (Malecki & Gorman 2001). Langdale (1989)
explains that large global companies that lease sophisticated networks that utilize high-
speed circuits dominate international business telecommunications traffic, allowing the
firms to link their networks to networks housed in major industrialized countries. The cost
of communication networks is largely determined by the maximal capacities of networks,
but the traffic those networks carry depends on how heavily those networks are used.
Thus, increasing the efficiency of data transport would make the Internet less expensive
and more useful (Odlyzko 2000).
The Internet today is an amazing network, but the individuals, organizations,
government, businesses, and educational institutions incorporated within and linked by
the Internet make it the invaluable resource it has become. The network is a medium for
the exchange of information, data, and ideas among those who use it. The Internet is the
most influential advancement in the distribution and exchange of information since the
telephone (Moss & Townsend 2000).
In February 1994, the National Science Foundation (NSF) designated four nodes
as Network Access Points (NAPs): San Francisco, operated by PacBell; Chicago,
operated by Bellcore and Ameritech; New York, operated by SprintLink (this NAP is
actually in Pennsauken, New Jersey); and Washington, D.C., operated by Metropolitan
Fiber Systems. NAPs are "sites where private commercial backbone operators could
interconnect" (Boardwatch 1999, p. 13). When NSF completely transferred responsibility
and rights to the Internet to commercial entities in 1995, the networks interconnected
only at the NAPs. When the NSFNET backbone was shut down and transferred to the
commercial entities, the NAP architecture became the Internet (Boardwatch 1999). Each
of these four NAPs would be maintained and operated by telecommunication companies
rather than the NSF.
As the Internet continued to grow and develop, traffic increased dramatically, and
the NAPs became increasingly congested and utilized; demand for more interconnection
points began to increase. The core of U.S. Internet interconnection remains the four
"official" NAPs and includes other major connection points, the Metropolitan Area
Exchanges (MAEs), Boardwatch 1999, p. 13). Thirty-eight of 41 major backbone
networks in the U.S. connect at both MAE East and MAE West (Malecki 2000). From
this, we can conclude that the importance of the NAPs has not declined. The MAEs and
NAPs were and are considered public facilities where backbones and ISPs could
interconnect and colocate at little or no cost. However, because a large number of
networks link at each NAP, congestion and inefficiency are common.
The solution to the demand for more efficient, faster connections was private
interconnection; a term originally coined "private peering." Network peers connected at
private locations rather than at the NAPs. The term private peering is sometimes
misused, describing network interconnections in general, whether the networks are
equals or gross unequals (for example, national Internet backbone provider networks
would not be equal to a local Internet service provider). Private interconnection is a
relationship between two or more ISPs in which the ISPs create a direct link between
each other and agree to forward each other's packets directly across this link instead of
using the standard NAPs, or Internet backbone Simply put, peering takes place
between network equals, and interconnection takes place between unequal networks,
with the weaker party paying for transit. Interconnection can involve more than two ISPs.
In this situation, all traffic destined for any of the ISPs is first routed to a central
exchange, which is called a peering point, and forwarded on to the final destination after
hitting the peering point. Private peering points function similar to the larger
interconnection points because they provide interconnection between networks.
Colocation facilities operate on a smaller scale than the NAPs, IX facilities, and MAEs
with contracts and higher fees for the users (Boardwatch 2000) The commercial Internet
operates as a machine of collaboration and cooperation between networks. "Every ISP
network must inter-operate with neighboring Internet networks in order to produce a
delivered outcome of comprehensive connectivity and end-to-end service" (Huston
1999, p. 1).
This chapter has discussed the relevance of this dissertation in relation to current
concerns of the protection of critical infrastructure, particularly telecommunication
networks. A large disruption to the Internet will cause large repercussions as it is directly
tied to the economy This dissertation determines the most important links and nodes to
the Internet backbone network by employing network analysis. It is hypothesized that the
links and nodes most critical to the network will have a high concentration of fiber
bandwidth and an equally high concentration of colocation and interconnection facilities.
Chapter 2 presents a review of literature relating to networks, geography and
telecommunications. Chapter 3 explains the data, analysis, and methodology used in
this research. Chapter 4 presents and discusses an unweighted analysis of the U S.
Internet backbone network. Chapter 5 presents and discusses a weighted analysis of
the U.S. Internet backbone network Chapter 6 presents a statistical analysis of network
measures, using indices derived from Chapters 4 and 5. Chapter 6 also summarizes the
findings and discusses future research possibilities of the U.S. Internet backbone
NETWORK CONCEPTS: A LITERATURE REVIEW
In order to fully understand the research implications and to identify related
research to further aid in answering the proposed questions, this chapter examines
relevant literature and reviews basic concepts in network analysis. The initial portion of
this chapter will focus on network concepts and constructs, with a brief overview of
networks. The second section reviews geographic literature of networks and information,
concentrating on network analysis, telecommunication infrastructure analysis and the
Internet backbone. The third section has two subsections: historical telecommunications
and the Internet, highlighting the importance of telecommunications to the city.
Network Concepts and Constructs
A short glossary of network terminology and concepts is provided below to
facilitate the discussions and analysis. These definitions will serve as a point of
reference and to help clarify various network concepts. A network can be defined most
simply as two or more nodes connected by link(s), an interconnected system of objects
or people, and consist of a set of links and nodes. Nodes and links are the main
composition of a network: nodes/vertices are connecting points or objects in a graph or
network while links/edges are the connections between them.
Graphs are abstract representations or models of a network. The terms vertices
and edges are most commonly used when describing a graph, a vertex refers to a node
and an edge refers to a link. In this text, vertices and nodes, and links and edges will be
used interchangeably in this text. Relative graph theory applies abstract configurations
that consist of points and lines to study network properties. A directed graph has ordered
pairs of edges connected by links with direction while an undirected graph has
unordered pairs of edges connected by links without direction.
A network component refers to which set of nodes a vertex belongs that can be
reached from it by paths running along edges of the graph. A geodesic path is the
shortest path through the network from one node to another, while the diameter
indicates the longest geodesic path between any two nodes (Kochen 1989). A circuit
indicates the flow of a path. Circuits denote direction of a path or route within a network,
that lead back to themselves.
A network connection or the connectivity of a node describes the relationship
between nodes or links in a network, a topological description that specifies the
interconnections between nodes. Connections may be direct or indirect. Accessibility is
a description that describes the degree to which a node can be reached or accessed by
other nodes in the network, given their absolute or relative locations. Location-based
accessibility is the degree to which a node can be reached or accessed given its location
in relation to the other network nodes, as based on physical distance and spatial
The nodality of a node refers to the degree of a node's dominance within a
network. To measure the location of a network's components is to determine the
centrality of a node within the network (Kochen 1989, Perrucci & Potter 1989). A
gateway is a nodal characteristic that describes a main entry-and-exit point for a region
or network. A disconnected node or segment has been detached or removed from the
network or subnetwork.
Structural equivalence is a measure of the similarity in roles of nodes in a
network, through the determination of which nodes play similar roles in the network. For
example, in Figure 2-1, nodes V1 and V3 are structurally equivalent, playing similar roles
within their regional cluster of nodes as collector nodes, or regional hubs. Structural
holes are areas of no connection between nodes that could be used for advantage or
/I X\ \
Figure 2-1. Network A
Network centralization is equivalent to a node's location or position in a network
considering all direct and indirect links, or from a multi-faceted standpoint, how a node is
characterized in terms of connectivity, accessibility, it's propensity to cause disconnects
upon it's removal or failure, and its importance in terms of adding redundancy or circuits
as a back-up node should other nodes (nearby) become removed or fail.
"Betweenness" is a measure of influence over what flows in the network, a
measure of power that a node has based on its relative location or position in the
network. For example, in Figure 2-2, node V3 is an important node in that it helps
connect two distinct regional clusters of lesser-connected nodes. Note that the removal
of node V3 would cause the network to become disconnected (Krebs 2004).
"Closeness" is a measure of how far any node is to any other node in the network.
For example, Nodes V3, V7, and V8, in Figure 2-2, are only two links away from any
other node in the network (max). Tier 1 in the nodal hierarchy in Figure 2-2 would be
comprised of nodes V3, V7, and V8. The second tier would consist of the remaining
nodes, V1, V2, V4, V5, V6, and V9. Note: Closeness is different from diameter in that
diameter (d=3 for this network) representing the minimum number of links between the
two most distant points in a network (Krebs 2004).
Figure 2-2. Network B
"Boundary Spanners" are nodes that are more central than their immediate
neighbors whose connections are only local. In short, they are regional hubs or the
center or predominant node in a regional cluster or subnetwork (Krebs 2004).
"Peripheral Players" are nodes that are often connected to outside or external
networks that are not currently mapped, making them of grater importance than nodes
that are not directly linked to those external networks. Hence, their importance may be
understated (Krebs 2004, Kochen 1989).
Networks may vary in nature, size, and purpose. They may range from the
environment, to human, to technical, from tangible to intangible and can be physical or
virtual. Networks might be human, technical, or natural, and private or public. Several
disciplines use study and utilize networks for various purposes; transportation,
landscape ecology, geography, neurology, telecommunications, communications,
physics, computer science, economics, health care and medicine, electric and gas,
water distribution and resources, urban planning, mining and geosciences. Networks
can be organized into various categories. The following section discusses physical and
virtual networks, technical networks. environmental and natural networks, social and
human networks, and private and public networks.
Physical and virtual networks
Physical networks are two or more nodes that are connected by a physical link;
roads, wires, corridors, streams, cables, pipes. The U.S. Interstate Highway System is
an example of a physical network.
Virtual networks exist conceptually, rather than being physically real. Though a
virtual network may not have a physical composition, it still serves the same purpose as
physical networks; to connect various objects for the sharing of data and knowledge.
Networks can be both physical and virtual in nature, the prime example being the
Internet (www.webopedia.com). Some social networks might also be considered both
physical and virtual; meeting physically or being virtually connected by association.
Technical networks are used in science or industry and are of a mechanical nature.
They typically include sophisticated equipment that may be electronic or computerized.
An electric company providing power to a neighborhood is an example of a technical
network. The power-grid itself is public, using the power plant, high-voltage
transmission lines, power substations, transformers, power poles, and transformer
drums to move power across a network to consumers. All of the equipment is electrical,
mechanical, or computerized in nature.
Optical networks are high-capacity telecommunications networks based on optical
technologies and components that provide routing, grooming, and restoration at the
wavelength level as well as wavelength-based services. Optical networks are providing
more advanced capabilities as well as lower costs for the telecommunications industry.
'In the early 1980s, a revolution in telecommunications networks began that was
spawned by the use of a relatively unassuming technology, fiber-optic cable. Since then,
the tremendous cost savings and increased network quality has led to many advances in
the technologies required for optical networks, the benefits of which are only beginning
to be realized" (IEC 2002).
Natural and environmental networks
Natural networks include environmental networks. As environment encompasses
disease and epidemiology, so percolation of disease on a network falls in this category.
The transfer of biological agents occurs through natural networks. With the sudden
onset of Severe Acute Respiratory Syndrome (SARS) in early 2003, we have been
reminded of the dangerous ability of disease to travel rapidly through a network of
individuals or entities. The Center for Disease Control (CDC) advised against
unnecessary travel to those areas infected with SARS (CDC 2003), in an attempt to
prevent the transmission of the disease through transportation and social networks.
Other natural networks include hydrological networks. Hydrological networks describe
the movement and interaction of water, including lakes, streams, rivers with each other
and other environmental impacts.
Neural networks are comprised of processing elements called units that respond in
parallel to a set of input signals given to each. Neural networks are most closely
associated with the study and research of the human brain as the unit closely represents
the brain's neuron. In the medical field, neural networks are called referred to as artificial
neural networks (ANNs) and have been in existence since the 50s and are used in a
wide variety of applications, including speech recognition, which was the original intent
of creating an ANN. "Artificial neural networks (ANN) are a very simple model of the
brain" (O'Sullivan & Unwin 2003, p. 364). ANNs are now used in geographical
applications. ANNs have geographic applications. They are based on the idea that
"brainlike structure = intelligence." This type of network assumes that it can operate in
two types of environments: supervised or unsupervised. A supervised network is a
network that has been trained or programmed by a known set of data. An unsupervised
network operates in a more traditional form, in that it eventually "settles to a state such
that different combinations of input data produce different output combinations that are
similar to a clustering analysis solution" (O'Sullivan & Unwin 2003, p. 365).
Social and human networks
Social networks are formed as a means to communicate with and within a group.
The nodes in social networks represent people or groups, while the links represent
relationships or flows between them. Examples of social networks might include a
school, political organization, a firm, unionized workers, and friends. Other types of
human networks may not be as obvious as others. For example, transportation networks
could be considered human. Transportation networks are built by for the purpose of
transferring goods and information. However, in order to create one network, the
disruption of another may occur. Road and railroad networks may cause landscape
fragmentation that might affect other types of networks. Recent research has focused on
the environmental impact of road networks, and further the impact upon animal behavior
and population (Carleton 2003). A review of relevant social network analysis follows later
in this section, within the review of complex network topology literature.
Private and public networks
Public networks can be used free of cost while a private network charges some
types of user fee. Government agencies, business and individuals can, and do, set up
their own private networks. Private networks allow for security and privacy by controlling
user access to the network. Private networks generally require a user fee. There are a
number of public networks, both large and small that are commonly available to
government, business and consumers in the USA. Some of the most common public
networks are: The Public Switched Telephone Network (PSTN, wired telephones) Public
Wireless Voice Networks (PWVN, such as cellular and Personal Communications
Service [PCS]) Public Wireless Radio Paging Networks (PWRPN), and the Internet.
Private networks, such as government and corporate, may or may not connect to public
networks, such as the Internet. However, these public and private networks increasingly
overlap. Private and public networks may be physical, virtual, or both. A private virtual
telecommunication network allows for an individual or entity to remotely access a larger
network using public network infrastructure, while maintaining privacy and security by
encrypting data that is exchanged.
The "Small World" Phenomenon
Many have studied the small world phenomenon including, but certainly not limited
to Milgram (1967), Kochen (1989), Wasserman and Faust (1994), Ozana (2001), Albert
and Barabasi (2002a), and Watts (2003) The small world concept is that, though a
network may seem vast in size, the nodes in the network are connected with short paths.
Milgram (1967) pioneered the concept and the most popular theme "six degrees of
separation." Milgram concluded that there was an average of six paths between most
pairs of people in the U.S.
Complex Network Topology Literature
This group of network literature discusses the exploration of the topological
properties of real networks. Complex network analysis is used in a variety of disciplines
to understand various networks and real systems (Albert & Barabasi 2002b, p. 49).
Albert and Barabasi (2002b) give a very detailed review of complex network research in
their paper titled Statistical Mechanics of Complex Networks, published in the Reviews
of Modern Physics. Newman (2003) also gives an excellent review of complex network
analysis in his publication the Structure and Function of Complex Networks. These two
papers were used as a guide for the following review of examples complex network
The social network serves as a major source of social capital. Being part of a
social network means to communicate with and within a group. Examples of social
networks might include a school, political organization, a firm, unionized workers, and
friends. Two techniques for approaching social networks are described by Watts (2003,
p. 48): network structure and social structure. Sociologist Mark Granovetter (Granovetter
as cited by Watts 2003, p. 49) concluded that effective social coordination does not
emerge from strong ties to a social network, but rather from occasional weak ties.
Granovetter described in his 1973 paper, The Strength of Weak Ties, a foresight of what
is now described by Watts (2003, pp. 49-50) as the new science of networks. Watts
concludes that "social network analysis still has one major glitch; there is no dynamics"
(Watts 2003, p. 50). Thus the measure and study of social networks is complex and
problems are approached differently, depending on the nature of the network and it's
Actor collaboration network
One of the most analyzed social networks is the movie actor collaboration network.
This network contains all movies and the casts of these movies since the 1890s. The
network is continuously expanding and updated and is based on the Internet Movie
database. Watts and Strogatz (1998); Newman, Strogatz and Watts (2001); Barabasi
and Albert (1999); and Albert and Barabasi (2000) have all used the movie actor
database. Watts and Strogatz (1998) reported that in 1998 the network had 225, 226
nodes (actors). By May of 2000, the number of nodes (actors) had grown to 449, 913
according to Newman, Strogatz, and Watts (2001). When two actors work together in a
film, they have a common link.
Science collaboration network
The science collaboration network is similar to the movie-actor network. When two
scientists work together, they are connected nodes. Newman (2001a, 2001b, 2001c)
studied four databases during a five-year time frame that included physics, biomedical
research, high-energy physic, and computer science to determine the topology of this
network. Each of the networks shows a small average path with high clustering
coefficients (Albert & Barabasi 2002b). The collaboration network of mathematicians and
neuroscientists that published between 1991 and 1998 was studied by Barabasi et al.
(2001); they were found to have consistent degree distributions with other collaboration
Sexual contact network
Liljeros et al. (2001) have investigated sexually transmitted diseases (STDs),
including AIDS. They studied a network based on the sexual relationships of 2810
individuals. The data was obtained from a Swedish survey conducted in 1996. The
distribution of sexual partners was studied for a year. The spread of STDs through the
network was studied (Albert & Barabasi 2002b). Due to the fact that the average edge in
the network has a relatively short-span, they analyzed the distribution of partners over a
The metabolism of 43 organisms was studied by Jeong et al. (2000). In this
project, networks in which the nodes were substrates and the links were chemical
reactions represented the organisms. The average path was found to be rougQri, Mre
same in each of the organisms. Wagner and Fell (2000) looked at the clustering
coefficient while focusing upon the energy and biosynthesis metabolism of the E-coli
bacterium. Their results show an undirected version of this substrate graph (network)
has a small average path length with a large clustering coefficient (Albert & Barabasi
2002b). Protein-protein interactions within a cell were also considered in the analysis, as
they help to characterize the cell network. The proteins represent nodes that are
connected if they bind together.
The nodes in a citation network represent published scientific articles and links
represent a reference to that particular scientific article. This network was studied by
Redner (1998). The network included 783, 339 papers cataloged by the Institute for
Scientific Information and 24, 296 papers that were published in Physical Review D
between 1975 and 1994. The network is formed by citation patterns used within the
publications, nodes represent published articles and links represent a reference to a
previously published article. Following Redner, Vazquez (2001) did a similar study using
the citation network. Vazquez extended the study to include outgoing degree distribution
and found an exponential tail.
Ferrer i Cancho and Sole (2001), Yook, Jeong, and Barabasi (2001b), and
Steyvers and Tenenbaum (2001) are amongst those researchers that study the complex
networks formed by human language. Steyvers and Tenenbaum's results indicate that
languages form networks and dynamics not so different from other networks (Albert &
Barabasi 2002b, p. 53). Ferrer i Cancho and Sole (2001) created a network using the
English language, based on the British National Corpus. The nodes represented nodes
and were lined to each other if they appeared next to each other, or were one word apart
from each other in sentence (Albert & Barabasi 2002, p. 53). The network consisted of
440, 902 words. Ferrer i Cancho and Sole (2001) found that the average path length
was small, there was a high clustering coefficient, and there was a two-regime power-
law degree distribution (Albert & Barabasi 2002b, p. 53). Yook, Jeong, and Barabasi
(2001b) used a different network for their study of the linguistic network. For their
network, two words were linked if they were synonyms according to the Merriam-
Webster Dictionary. Their results show a large cluster of 22, 311 words out of a total of
Ecologists study food webs or food chains to determine the network relationships
between various species. In a food network, nodes represent the species and the links
would be the predator-prey relationships between them (Albert & Barabasi 2002b).
Williams et al. (2000) recently studied the topology of some of the largest food webs;
Skipwith Pond, Little Rock Lake, Bridge Brook Lake, Chesapeake Bay, Ythan Estuary,
Coachella Valley, and St. Martin Island. Though the webs were comprised of very
different species in different habitats, each indicated that species in habitats are three or
fewer links from each other (Williams et al. 2000). The research of Montoya and Sole
(2000), and Camacho et al. (2002a) confirmed that the food webs show highly clustered
nodes. Montoya and Sole's research focused on Ythan Estuary, Silwood Park, and Little
Rock Lake. Two of their research areas overlapped with that of Williams et al. (2000).
Camacho et al. (2002a, 2002 b) found that an exponential fit worked well, following the
well-documented existence of key species in the food web. They represent a common
feature of scale-free networks, hubs. Forman and Spearling (2002) explore road
ecology. Forman and Spearling discuss the vast network of roads that billions utilize
daily. They point out that until now, there has been little or network theory applied to
road networks and ecology. The road network and landscape indeed form a complex
network. Forman and Spearling did a study of the 4 million miles of public roads in the
U.S. and determined how much area they ecologically affect. They concluded that about
one fifth of the total U.S. area, 20%, is directly affected ecologically by our road system.
Telephone call networks
The long-distance telephone call network has been studied by Abello, Pardalos,
and Resende (1999) and Aiello, Chung, and Lu (2000), amongst others. They
constructed a large, directed graph using long-distance telephone call patterns. Phone
numbers represent nodes, while every complete call represents a link. These
researchers used the calling network based on the data from one day. They concluded
that the degree distributions of the outgoing and incoming edges followed a power law
with exponent 2.1.
Power and neural networks
The U.S power grid consists of generators, transformers and substations, the
network nodes. The links are the high-voltage transmission lines. With the power outage
effecting the northeast U.S in the summer of 2003, we saw the interconnectedness of
this network. The degree distribution of the power grid is consistent with an exponential
(Albert & Barabasi 2002b, p. 54). Watts and Stogatz studied the nematode worm, where,
the nodes are neurons and a link exists between either a synapse or a gap junction
(Albert & Barabasi 2002b, p. 54) In their research, Watts and Strogatz (1998) found that
for both networks (power and neural) the average path length was approximately equal
to that of a random graph of the same size and average degree, and the clustering
coefficient was much higher (Albert & Barabasi 2002b, p. 54).
World Wide Web & the Internet
One of the most recent complex networks to be examined is the Internet. As was
introduced earlier, geographers analyze networks, and the geography of networks is
often relevant to other disciplines. While geographers were working on early network
analysis of transportation networks using graph theory (Kansky 1963, Garrison 1960,
Haggett & Chorley 1969), Erd6s and Renyi (1960) were focused on theoretical work of
complex networks. They modeled large networks using algorithms where N nodes were
randomly connected according to probability P. They found that the nodes were
connected in a manner that followed a Poisson distribution' (Albert & Barabasi 2002B,
p. 49). The network model created by Erdos and Renyi was used widely in several
'The Poison probability distribution is used to analyze how frequently an outcome
occurs during a certain time period or across a particular area. Other geographic
applications of Poisson involve the analysis of existing frequency count data to
determine if a random distribution exists (McGrew & Monroe 2000).
disciplines analyzing networks. The most closely related research of this group would be
Internet topology generators (Radoslavov et al. 2000).
According to Barabasi et al. (2001), the absence of topological data in the analysis
of complex networks makes random network models the most often applied method of
network simulation. As computer technology advanced, and data for real world networks
became more available, several empirical finings emerged. Three network
characteristics resulted most often from complex network analysis: short average path
length, high level of clustering, and power law and exponential degree distributions
(Albert & Barabasi 2002b, pp. 48-49). A short average path indicates a short distance
between nodes in a network, while topologically close nodes that are well connected
form clusters. In 1998, Watts and Strogatz formalized this cluster concept for complex
networks using several large data sets. The real world networks they analyzed were not
completely random but instead displayed clustering at the local level. Local clusters
linking to other local clusters formed "Small worlds." This analysis was followed by
studies performed by Albert and Barabasi (2002b) and Adamic and Huberman (1999),
which concluded that when the WWW is studied as a graph it follows power law
distribution2 rather than Poisson or exponential distribution.
Albert and Barabasi (2002b) have described research of the Internet in two realms;
the World Wide Web and the Internet. Albert and Barabasi (2002b) label the documents
(web pages) of the Internet as the nodes and hyperlinks (URLs) as links. Lawrence and
Giles (1998,1999) have estimated the size of this network as having close to one billion
nodes based on 1999 data. Network research of the WWW has increased as the
network experienced rapid growth, and after it was realized that the distribution of the
web pages "followed a power law over several orders of magnitude" (Albert & Barabasi
2A power-law implies that small occurrences are extremely common, whereas large
instances are extremely rare. A function, f(x), is a power law if the dependent variable, x,
has an exponent (i.e. x is raised to some power).
2002b, p. 49). Albert, Jeong, and Barabasi, (1999). Lawrence and Giles (1998, 1999),
Adamic and Huberman (2000), and Adamic (1999) are a few researchers among the
many that have studied the complex network topology of the WWW and Internet.
According to Albert and Barabisi, the topology of the Internet is studied at two levels: the
router level and the interdomain level (Albert & Barabisi 2002b, p. 49). All nodes are
routers and all links have physical connections between them at the router level. The
interdomain level consists of hundreds of routers and computers, each represented by a
single node (Albert & Barabisi 2002b, p. 52). The interdomain level and the router level
have both been studied by Faloutsos et al. (1999), who concluded that in each case, the
degree distribution follows power law. The connectivity of the routers was mapped by
Govindan & Tangmunarunkit (2000). Yook et al. (as cited in Albert & Barabisi 2002, p.
52) and Pastor-Satorras et al. (as cited in Albert & Barabisi 2002b, p. 52) have confirmed
in their studies of the Internet that the network does display clustering and small path
The majority of research on complex networks revolves around abstract or
theoretical networks and geography is not relevant. But networks do impact geography,
and vice versa. At the same time, the Internet is dramatically affecting the city, making
research of the Internet's geography relevant and important. The following research will
contribute directly to geographic literature and research of the Internet backbone
network, the effect of the Internet upon cities, and the study of complex networks.
Modeling Networks with Geographic Information Systems
A geographic information system (GIS) can be used to model a network. There are
various GIS structures that can be used as tools in modeling linear features, including
coverages, geodatabases, geometric networks, logical networks, optical networks. GIS
based modeling programs tie linear features to spatial coordinates, unlike other
Network modeling is dominated by vector GIS, though it is possible to model most
networks using raster-based GIS (Zeiler 1999, Malczweski 1999). Bernhardsen (2002)
discusses raster connectivity operations, which is a process that requires discrete cell-
by-cell displacements, that originate from a single starting point. The cells must contain
values that are significant in how one can move on the surface. This means the raster
cells represent a friction surface. It is easier to model path attributes such as direction
and flow in a vector GIS. The grid cells used in raster only approximate the exact shape
of a line in a network, direction is not explicitly given, and line and node attributes must
be stored as a separate layer (Bernhardsen 2002).
GIS based systems enable the user to take advantage of dynamic segmentation.
This is an extremely important feature in building a network model. Dynamic
segmentation is a two-step process performed on a spatial data set comprised of linear
features. First, a route system is created by associating adjacent line segments into one
or more groups that have a definite linear sequence. Second, descriptive information is
associated with the route system by referencing distances from the starting point of each
route. Dynamic segmentation allows tiny areas along a line feature to be referenced
without actually breaking that line into smaller pieces. This means that linear distances
can then be calculated directly from the routes and associated attributes (Northwest GIS
Services 2002). Dynamic segmentation uses a linear referencing system (linked to
geographic coordinates) to define a common datum for referencing the linear lines
O'Sullivan and Unwin (2003) explain that while software packages such as ESRI's
Network Analyst are showing great promise in the realm of network analysis, a complete
comprehensive tool kit that will address the complexity of line objects and the advanced
mathematical concepts needed for analysis is still years away. This is because statistical
approaches to lines, as well as graphs, have had only limited success. In agreement,
Malczewski (1999) notes that some researches contend that there are operational
limitations on the use of optimization models for spatial decision analysis in a GIS
environment. But, Malczweski maintains that although GIS presently optimizes in data
gathering and visualization of the results, it can be fully integrated to provide a powerful
tool for spatial decision support in multi criteria decision making.
GeoDatabases and Geometric Networks:
Geometric networks are networks that model linear systems such as utility
networks and transportation networks (MacDonald & ESRI 2002). They support a rich
set of network-tracing and solving functions. Geometric networks consist of edge
network features and junction network features (Zeiler 1999). Edge elements are
connected to other edge elements via junctions. There are two types of network
features; simple and complex. Simple network features correspond to a single network
element, while complex features correspond to more than one network element.
Principal benefits to the geometric network model (Zeiler 2002, p. 128) include the
S Editing networks is simple. When a user adds network features, one can ensure
that they are properly connected to the rest of the network with network
S Network features can represent complex parts of a network, such as switches. This
simplifies the editing process and allows one to create maps of a higher quality
with less features in one's network representation.
* A suite of simple and advanced network analysis solvers is built into Arclnfo, ready
to use. Network analysis is fast even on very large datasets.
S Networks can be versioned. Multiple users can simultaneously edit the same large
network in compliance with their organization's work-flow practices.
Geodatabases, part of ESRI's ArcGIS software is a unique data format that is
similar to the coverage data model. It is a storage mechanism for spatial and attribute
data that contains specific storage structures for features, collective features, attributes,
relationships between attributes and relationships between features.
There are two main concepts to understanding a geodatabase:
S A geodatabase is physical store of geographic information inside a database
S A geodatabase has a data model that supports objects with attributes and
behavior. Behavior describes how a feature can be edited and displayed. (ESRI
A geodatabase has the capability to allow multiple users working from it
simultaneously: Geometric networks are created using geodatabases. The data and
network functions and flows and relationships are used to build a geometric network
model through the geodatabase. Given the capabilities and sophistication of the
geodatabase and geometric network models, it would be ideal to build a
telecommunication infrastructure data model using these tools given flow or line data are
available. The geometric network model allows several data types to be incorporated
into the model, which is what the current project calls for. Several types of
telecommunications data with different characteristics and capabilities are being studied.
The model would allow not only for organization and a model of the data, but simulation
exercises and practices that would not be possible in non-GIS supported network
Geographic Literature of Networks & Information
Geographic Network Analysis
The Internet is a data/information transport network with the ability to connect
places that are geographically separated, moving data from node to node, user to user,
service to service, workstation to workstation. Though geographers have a long history
of applied network analysis (e.g., Lalanne 1863), relatively little has been done on the
geography of the Internet. Geographic research has mainly focused on transportation
networks. In 1961 Garrison and Marble published their research findings on the U.S.
transportation system in The Structure of Transportation Networks. They concluded that
transportation structure is dependent upon the characteristics of the location housing the
network. Garrison and Marble (1961) also incorporated the work of fellow geographer
Brian Berry (1960) into their analysis by utilizing his measurements of technological and
demographic factors. Berry had done substantial research on networks and economic
variables, incorporating technological and demographic variables into his research. He
synthesized statistical measurements of levels of development to reveal the basic
factors underlying variations in the measurements of development. Partnering with
Berry, the researchers were able to incorporate national development into their analysis
using regression methods. They found that technological development was the major
determinant of network structure and that physical characteristics of a location are less
significant in explaining network structure than level of development of a location (Taaffe
& Gauthier 1973, p. 112). Kansky continued the research of Garrison and Marble (1961)
and Berry (1960) in his 1963 paper titled "The Structure of Transportation Networks."
Adams (1971) followed the U.S. highway network research with an analysis of the
domestic airline network. Adams used matrix methods to study airline growth and
connectivity. Nyusten and Dacey (1968) expanded the research agenda to include
telephone networks & other types of fixed infrastructure. Taaffe and Gauthier (1973)
followed with a text that demonstrated and explained how geographers study
transportation systems. In 1977 Haggett, Cliff and Frey published two volumes
explaining locational analysis & methods that are still applied in geography today. The
major contribution of this research was the development of spatial models for network
structure relating to location, density, and change over time. Haggett, Cliff and Frey also
explored network nodes, and the hierarchical structures they form within networks.
Geographers have also contributed to network analysis in related fields. Mitchelson
and Wheeler (1994) illustrated the importance of information flows through the U.S. in
terms of the global economy. Longcore and Rees (1996) have shown the importance of
telecommunication infrastructure in financial districts. Hepworth (1990,1991) has studied
the Geography of the Information Economy. He concluded that IT convergence has led
to the centralization of information activity while communication between locations has
enabled the decentralization of knowledge-creation.
Using FedEx geographic delivery data, Mitchelson and Wheeler (1994) explored
the relationship between information flow and the U.S. city hierarchy within the context of
the global information network. They defined criteria for unstable economic conditions:
deregulation, globalization, demassification, and vertical disintegration as important
factors in promoting instability. These conditions are dependent upon the information
economy, and the exchange of information is critical when instabilities arise. The
Internet is a tool to transfer information, data, and ideas, and acts as a stabilizing force
in the global economy. Mitchelson and Wheeler's analysis can be applied to the Internet
to give insight into information flows and spatial structure of information economy, just as
the FedEx delivery system is used to establish a domestic hierarchy. Longcore and
Rees (1996) built upon the work of Moss (1991,1998), Castells (1989, 1993, 1996, 1997,
1998, 1999), Sassen (1991, 1994, 1995, 1996, 1999, 2000), Dicken (1994, 1998) and
others to study information technology and networks at the local level. Longcore and
Rees (1996) used the financial district in New York City to assess inter-urban information
flows. The found that the decentralization of central city office activity was enabled by
electronic communications and concluded a new urban hierarchy was emerging, based
on inter-urban information flows (Longcore & Rees 1996). However, they also concluded
that only global cities could support a sufficient concentration of telecommunication
infrastructure. The demand for telecommunication infrastructure would exist in larger
cities, implying that infrastructure capacity is reflective of underlying market conditions
and position in the global hierarchy.
More recently, the information and economic flows of e-commerce3 have been
studied. Leinbach and Brunn (2001) address the rapid growth of IT4 sectors that demand
a technically trained and highly skilled workforce. They conclude that the major cost
burden of communication infrastructure has fallen on the shoulders of the private sector.
Button and Taylor (2001) and Kenney and Curry (2001) illustrate the importance of the
Internet and e-commerce. They both conclude that the Internet is an important tool for
reducing transfer costs. Goodchild (Leinbach & Brunn, 2001) addresses the location
theory implications of the Internet and e-commerce and concludes, "The Internet is more
than just another communications device. It is a newly developed space wit the power to
give rise to novel forms of human social interaction in almost any area of human
endeavor, commercial, or otherwise" (Goodchild 2001, as quoted in Leinbach & Brunn
2001, pp. 63-63). Malecki and Gorman (2001) study the physical structure of the
Internet and the importance of geography asserting that the Internet illustrates "both old
and new geographies" (Malecki & Gorman 2001, as quoted in Leinbach & Brunn 2001,
p. 103). Still others have studied e-commerce in firm, regional, and global contexts:
Aoyama (2001), Cobb (2001), Coe and Yeung (2001), von Geenhuizen and Nijkamp
(2001) and Langdale (2001).
Cukier (1998) tackles the geography of the Internet on a global scale, concluding
that in a postmodern world of consumerism and industry, geography matters; but in a
digital economy, information is the main product of value, and connectivity is what really
matters. Recent claims have touted the "death of distance" (Cairncross 1997) in
business. President Bill Clinton's 1998 address to the United Nations proclaimed the
Internet is responsible for "the death of distance," and he asked the United Nations to
support the new technology [the Internet] (Clinton 1998). Dodge and Kitchin (2001) have
3 E-commerce refers to the exchange of goods and services via the Internet.
4 IT refers to Information Technology
continued to stress that geography remains important despite these claims. They detail
a literal, conceptual, and metaphorical mapping of information and communication
technologies and cyberspace and conclude that even in cyberspace, geography matters.
Warf and Purcell (2001) contend the idea that the relevance of geography and location
are pertinent because the Internet compiles and portrays a definite spatial structure that
reinforces existing relations of wealth and power. They acknowledge that "though
deregulation and digitization have severely attenuated the linkages between money and
space" global money does not "presuppose the disappearance of the nation-state, but
rather a rearticulation of its functions" (Warf & Purcell 2001, p. 240).
When users are browsing the web, traveling from site to site and location to
location, geography is, to the user, of little relevance. However, the geography of the
Internet's infrastructure is of great relevance. Wilson (2001) reminds us that seeking
territory in cyberspace has both "metaphorical and real geographic elements." Wheeler,
Aoyama and Warf (2000) have produced a publication that concentrates on the
geographic distribution of telecommunications and discuss how changes and
innovations in the economic system are catalyzed by telecommunication networks. Their
publication includes descriptions of how telecommunications have brought about the
restructuring of cities such as Atlanta, Phoenix, and Sunderland, England. They cover
the geography of Internet real estate, telecommuting, and urban planning and attribute
changes in the economic system to the heavy influence of telecommunication networks.
Recently there have been two ideas about the effects of telecommunications on
cities. The first idea is that information transfer will replace distance, causing the death
of cities (Gilder 1995). "Some social theorists argue that new information technologies
will inevitably lead to the economic decline of cities as electronic communications make
it possible to replace the face-to-face activities that occur in central locations" (Moss
1998). "We are headed for the death of cities" (Gilder 1995). The second idea is a little
more rational, given that cities continue to experience population growth:
Telecommunications technologies are not a replacement for personal interactions, but
It is also possible that telecommunications are not a substitute for face-to-face
interactions, but in fact these two forms of information transmission are
complements. If they are compliments, then we should expect cities and [selected
urban] space to get more important as information technology improves. (Moss
The implication is that telecommunication infrastructures are likely to reinforce existing
trends rather than create divergent trends.
The analysis of transportation and telephone networks has been an important
research topic in geography for some time. Nonetheless, there has been an absence of
studies on the Internet and Internet infrastructures. Geographic analysis of the Internet
has increased in recent years, though mostly describing the growth of Internet hubs and
capacity and the geographic distribution of networks and the Internet's users and traffic.
Little emphasis has been placed on the complex connectivity of the Internet from a
network analysis standpoint, particularly the Internet backbone network.
Geographic Research of Telecommunication Infrastructures
Geographers have begun to analyze the Internet and telecommunication related
infrastructures; including colocation facilities, Network Access Points (NAP),
Metropolitan Area Exchange's (MAE), Internet Exchange (IX) Points, Marine
cablelandings, Point of Presence (POPs), Internet backbones, fiber routes, cellular
towers. Recent work has also examined wireless structures (Gorman & Mclntee 2003);
Web content; and information production and distribution on the Internet has also been
explored (Zook 2001, Wilson 2002); and the locational attributes of colocation facilities
The colocation industry emerged as demand for the interconnection of
telecommunication networks rose dramatically with the growth and proliferation of the
Internet. These facilities serve as physical interconnection hubs for ISPs, Internet
backbones and servers and are known by many nicknames: telehouses, telecom hotels,
and Internet hotels. The location characteristics of other types of Interconnection hubs,
NAPs, MAEs, IX Points, and marine cable landings have also been examined (Mclntee
2001). These interconnection facilities are clustered in cities rich in telecommunication
infrastructure, specifically fiber-optic networks, connecting the networks. Evans-Cowley,
Malecki and Mclntee (2002), Malecki (2002), and Malecki and Mclntee (2003) have
further explored the colocation industry and the geographic location of these facilities
and their effect on urban places and urban structure. Telegeography (2001b) also has
contributed to information of the geographic distribution of the colocation industry by
compiling datasets of colocation facilities.
Point-of-presence (PoPs) facilities are a type of infrastructure that allow for the
connection between local Internet service providers and Internet backbones. Grubesic
and O'Kelly (2002) found that the greater San Francisco area led all U.S. metro areas in
the number of POPs in 2000, likely attributed to the high concentration of Internet
networks housed in the city.
Another important telecommunication infrastructure that has been studied by
geographers is cell towers. Gorman and Mclntee (2003) found a strong and significant
relationship between cell towers and the volume of data traffic and the location of
colocation facilities within a C/MSA implying that market size and urban growth are key
in the location of cell towers. In 2001, 65,000 cell towers existed in the U.S. Of that
number, 41,204 were located in C/MSAs (Gorman & Mclntee 2003).
Web content, hosting facilities, and location of information production have also
been studied in geographic contexts (Malecki 2002, The Economist 2001). Zook (2001)
has explored the physical locations of adult video content providers vs. online content
providers. He concluded that there is a "stronger connection between Internet content
and information-intensive industries than between the Internet and the industries
providing the computer and telecommunications technology necessary for the Internet to
operate" (Zook 2000, pp. 411-412). Wilson (2002) has analyzed the geographic location
of virtual casino domains. He found that in 2001, the U.S. led the distribution of casino
domains, housing roughly 25% of the world's casino sites; but the urban distribution
within the U.S. was widely dispersed compared to the Internet industry at large.
Considerable research has been done into the affects of new communication
technologies on cities and the urban hierarchy. These studies revealed that
communications infrastructure has disproportionately agglomerated in the largest
metropolitan regions (Malecki & Gorman 2001, Malecki & Mclntee 2000, Mclntee 2001,
Moss & Townsend 1998, Wheeler & O'Kelley 1999), a pattern that reinforces the
predominance of those metropolitan areas within the urban hierarchy. This concentration
of infrastructure in the largest cities confirms the theory that telecommunications
infrastructure and technology will not bring the decline of cities, but rather complement
cities in their attempt to stay viable as centers of commerce. Although past research on
telecommunications includes the networks that comprise the Internet, little is presented
other than the topology of the nodes and links (Hepworth 1990, Kellerman 1993, Malecki
& Gorman 2001). Topics of importance are included in Brunn and Leinbach's (1991)
Collapsing Space and Time: Geographic Aspects of Communication and Information:
geography and communications, information economies, communications, technologies,
and regional development, and social dimensions of information and communications.
Longcore and Rees (1996) studied city structure change, influenced by the most recent
changes in information technology, using Manhattan as a case study (Longcore & Rees
1996). Although their research concluded that the financial district land market might
cause the tightly focused financial district to demonstrate geographical flexibility, they
recognize the importance of face-to-face contact, and proximity to sophisticated
telecommunications infrastructure (Longcore & Rees 1996).
Geographic Research of the Internet Backbone
Within the past decade various researchers & organizations have looked at the
Internet backbones. Most of this research has focused on the Internet backbones in the
U.S., though some has discussed the structure of the Internet in Europe. The following
section reviews geographic literature of the Internet backbone network.
Telegeography (2000, 2001) reported that the international backbones with the
largest bandwidth capacity in 2000 were not surprisingly located between London and
New York (26680.5 mbps) and between London and Paris (24340.5 mbps5). The
backbone link between San Francisco and Tokyo had the largest bandwidth capacity
between North America and Asia in 2000 (capacity 7550.0 mbps) (Telegeography 2000,
2001). Telegeography (2001) has also reported that New York serves as the Internet's
most global metro area, directly connected to 71 countries in 2001. Five of the top-ten
cities cited in that report were intercontinental backbones located in the U.S. This
reinforces the theory that Internet traffic is heavily reliant upon the U.S. as a centrally
located switching hub in the global telecommunications network.
Between 1997 and 1999, the U.S. experienced large, rapid growth in the Internet
backbone network with a 420% increase in data transfer capacity (Moss & Townsend
2000). Moss and Townsend (2000) also reported that they found an increasing
concentration of Internet backbones in several mid-sized locations that were centrally
located. World Cor, Sprint, and Cable & Wireless dominated the Internet backbone
network in 1999, controlling about 55% of the domestic market (Telegeography 2000).
By 2001, with mergers and acquisitions flooding the market, World Com controlled 37%
of the domestic market (O'Kelly & Grubesic 2002). O'Kelly and Grubesic (2002) found
5 Mbps- Million bits per second. A measurement of data transfer rate.
that East coast cities in the U S. experienced a high concentration of Internet bandwidth,
a phenomenon that begins to slowly diffuse westward. They attributed the high
concentration of links and bandwidth in Washington, D.C. to the combination of its role
as a capital city and it's high-tech industry. They also concluded that Chicago was the
most- accessible city in the U.S. backbone network based on the fact it had more
Internet connections or pathways between it and every other U.S city.
Wheeler and O'Kelly (1999) studied the accessibility levels of 31 Internet
backbones in 1997. They found that Washington, D.C., Chicago, San Francisco, New
York, and Dallas led the U.S. in the most accessible cities in the Internet backbone
network. Malecki and Gorman used connectivity matrices to determine U.S. city
hierarchies based on 1-hop links and 2-hop links in the bandwidth-weighted matrix. They
concluded that network analysis of the Internet illustrated old and new geographies; the
Internet has changed the meaning of distance, space and r-e ;e..-:grapr.i:ai significance
of places, following old routes while also establishing new ones (Malecki & Gorman 2001
in Brunn & Leinbach, p. 103). The use of binary connectivity matrices confirmed the
"strong spatial bias and hierarchical structure of U.S. cities-one that differs from the
conventional population-based hierarchy" (Malecki & Gorman 2001 in Brunn & Leinbach
p. 103). Malecki and Gorman found that the major cities in the economy double as the
major nodes of the Internet.
History of the Internet and Parallel Developments in Telecommunications
Early Communication Networks
Communication networks date back to antiquity. The story of Phidippides, who ran
in 490 B.C. 36.2 km from Marathon to Athens to warn the Athenians of an approaching
army is one of the earliest examples of a communication network (Holzman & Pearson
1995, p.1). Early communication systems included the Pony Express (1860-1861),
pigeons (as early as 776 B.C.-still used in 1981 by an engineering group in California),
mirrors and flags, fire beacons, watchman and senators. The first telegraphic device was
reportedly around 350 B.C., when a rudimentary device using fire signals to direct flow of
water in Italy. There followed a two-thousand year gap in telegraphic devices until the
telescope as invented in 1608 by Hans Lippershey (Holzman & Pearson 1995, p. 31).
More modern telecommunication closely related to today's communication networks
arrived in 1844 with the invention of the telegraph (Hugill 1999, Wilson & Corey 2000).
A Brief History of Telecommunications
The telegraph is described as the earliest ancestor of the Internet; like most
communication technologies such as the telephone and the Fax machine, the Internet
has been built upon the foundation of the telegraph (Standage 1998). The telegraph was
the first in a long series of inventions and technologies designed to exchange
information electronically (Lebow 1995) The printing press cannot be overlooked,
however, as it allowed the first type of "one-to-many" communication and introduced a
mass-produced format that allowed for fairly rapid exchange of information and data.
Telecommunications changed little until networked computers allowed "many-to-many"
communications (Malecki 2002). The many-to-many communication has been both an
aid and hindrance. For example, users are able to e-mail multiple recipients with news or
information. At the same time, users are subject to annoying e-mail and advertisements
coined as "spam."6 The media are constantly informing us that we are in the midst of a
communications revolution due to rapidly changing information transfer technologies. It
may be relevant to acknowledge that the electric telegraph was a far more disruptive
technology to its era than the Internet has been to us (Standage 1998). The printing
press and telegraph are most credited to having an impact as significant as the Internet:
the printing press and the telegraph (Malecki 2002).
6 Spam is the term used to describe unsolicited "junk" e-mail sent to large numbers of people
to promote products or services.
Two notable differences in telecommunications and transportation are apparent
since the arrival of the telegraph: moving intangible goods, data, and information are not
the same as moving tangible goods (Hillis 1998). In addition, telecommunications is to
function as a network with simultaneous utilization by many users sending and receiving
such intangibles (Rosenberg 1994), making telecommunications a great influence on
business. In short, business began to use the new technology to exchange information
without involving physical movement of man or animal to do so. Banks and financial
corporations were the first types of businesses to take advantage of this new technology
(Beniger 1986, Gabel 1996). Telecommunications revolutionized interaction between
individuals and institutions as well as created a new platform for networks and
Communication technologies have become increasingly bundled in recent years.
Many types of communication devices are being developed to serve multiple purposes
with maximum convenience. Kellerman (2002, p. 15) describes how it has become
possible to use the computer as a telephone, fax and TV, and receive several of these
services from a single service provider. He muses that "this fusion may possible mature
into a single appliance for information consumption and production, as well as so-called
public networks of data and software" (Halal 1993, as cited in Kellerman 2002, p. 15).
Phone companies such as Sprint and Verizon are now offering cellular phones that have
the capability of wireless Internet access, fax, photos, and more (www.sprint.com,
www.verizon.com). Phone companies are also offering direct service lines (DSL) in
addition to regular phone service. Cable companies have also expanded their services
to compete in the new high-speed Internet access industry. Media conglomerate Time
Warner's cable division recently introduced 'Roadrunner' to compete in the Internet
service provider market. Roadrunner is a -irn -...:,ee, ., online service providing unique
broadband7 content, services, and lightning-fast access to the Internet. Road Runner is
delivered to your computer over the same upgraded cable systems that currently bring
cable television to the home (http://www.roadrunner.com). Today, many industry
analysts predict that with the growth of data networks, voice traffic will increasingly travel
over Internet protocol (IP) technology. With increasing data traffic, the demand for
Internet fiber is on the rise (National Research Council 2001).
Telecommunications and the City
Location Decision and Telecommunications
Telecommunications technology has been booming since the early 1970s.
Prominent advancement and change in telecommunications technology have influenced
location decisions in business. Different types of firms are scrambling to locate in areas
rich in technology infrastructure Traditional location theory typically includes: local input
and output, transferable inputs and outputs, climate, labor supply, taxes, and local
economy. Many firms continue to use traditional location factors but are beginning to
incorporate new factors into location decision, especially high-bandwidth Internet
connection. These firms include those involved in banking, research, marketing,
telecommunications, and many more. High-bandwidth connectivity is an increasing
attractive asset to firms' location decision but at the same time, the firm's location is
pertinent. The assumption that a firm can ignore geographic location because of
technology is false. Geographic location is still an important location factor for many
firms. Firms dependent upon technology search out locations that have strong
technological infrastructure. This infrastructure might include: switches, POPs (point of
presence), NAPs, Gateways, etc.
' Broadband describes a transmission facility having a bandwidth sufficient to carry
multiple voice, video or data channels simultaneously.
The presence of modern telecommunication technology reduces transfer and
information acquisition costs. Transfer of data is dramatically reduced in terms of time.
Transfer costs of physical commodities and production inputs for assembly have
traditionally been a main factor of location decision, and still are. However, the actual
transfer of some goods has changed given the switch in emphasis to information
services and intangibles. The transportation of data, electronic mail, music, movies, and
news are not physically transferred. These goods can be transferred electronically.
Other goods (such as food, clothing, bicycles, and tangible goods) must be physically
transported. Many firms use both types of transfer. For example, a company that sells
women's apparel. This company might be involved in e-commerce, but the final clothing
item must be shipped to the customer. So geographic location continues to be an
important factor in this firm's location. The firm must be "connected" technologically,
while at the same time able ship goods to customers at minimum costs.
Telecommunication networks are "friction reducing" technologies, that enable transfers
between remote locations for costs that are substantially lower than the physical transfer
of information between them" (Salomon 1988) via human interaction or hard copy, etc.
Whittaker Associates has identified the most important site-selection factors to the
Business Services industry. Of the 51 factors studied, the top-ten include the following:
S Secondary education quality
Effective cost of skilled labor
S Effective cost of unskilled labor
S Availability of executive, administrative, managerial workers
Geographic proximity to markets
S Access to business & tech. Services
S Business taxes
Telecommunications technology will increase the potential of cities, and has in fact,
since the 1980s, revitalized the central business districts of the leading cities and
international business centers of the world-New York, Los Angeles, London, Tokyo,
Paris, Frankfurt, Sao Paulo, Hong Kong, and Sydney. These cities have reached their
highest density of firms ever, providing further evidence that cities are not on the decline
(Sassen 2000). With the knowledge of the benefits sophisticated telecommunication
technology can bring to urban centers, many cities are welcoming infrastructure.
Sophisticated technology can benefit the city by attracting those firms seeking the
infrastructure, such as financial, media, and web-based firms, which in turn provides
jobs, education, and services for those who have access to the infrastructure. Those
who do not have access to the technology are at a disadvantage. Telecommunication
technology is changing the economies of networks, emphasizing service industry rather
Many firms have included technological infrastructure as important factors in their
location decision factors. The highest level of advanced telecommunications
infrastructure is found in the country's largest cities; implying that cities that are better
connected with sophisticated telecommunications infrastructure are a more attractive
location. These high urban centers are able to cash in the technological amenities they
offer. Many public services such as libraries, tax and finance administrations, and
criminal justice systems are information intensive, dependent upon computers,
telephones, and sophisticated information retrieval and imaging systems. "A city's future
as an information center depends on information-producing activities that occur through
both face-to-face and electronic communications" (Moss 1998).
Infrastructure in Cities
The intra-urban patterns of telecommunication infrastructure are greatly dependent
upon each other. For example, the physical structures of the fiber-optic networks on the
ground are greatly dependent upon interconnection facilities. At the same time,
colocation facility location is just as dependent upon the concentration of fiber networks.
The same cities that lead bandwidth rank also lead colocation rank (Mclntee 2001).
Cities that lead the U.S. in terms of bandwidth concentration consecutively lead the U.S.
in terms of colocation facility concentration. The relationship between different types of
Internet infrastructure is a simile to the figure of speech "which came first, the chicken or
the egg?" The relationship between colocation facilities and Internet bandwidth is no
exception. It is difficult to determine which is more dependent upon the other, as
colocation facilities are interconnection points for Internet backbones, while at the same
time, colocation facilities locate in close proximity to termination points of Internet
The Internet has sparked a concern for new legislation and policy to help protect
those who use the Internet as well as those who are affected by the Internet and its
infrastructure. There are many concerns with the effects of the Internet, and its
infrastructure location within metropolitan areas. The negative effects can be blanketed
under one term: the digital divide (Sassen 2000, Wilson & Corey 2000, Wheeler,
Aoyama & Warf 2000). The digital divide is a simple definition for the gap between the
rich and the poor widening as those with wealth and affluence have access to the
increasingly valuable advantages of sophisticated technology, such as the Internet,
while the poor are further disadvantaged because they do not have equal access to this
Those firms that benefit greatly from Internet infrastructure especially those
involved in financial services, media, consulting, are using infrastructure as an
increasingly important factor in location decision (Finnie 1998, Kotval 1999, Longcore &
Rees 1996). Communities have been using their telecommunications infrastructure as a
strategy to attract new business and to increase their overall economic competitiveness.
In a European survey of 500 companies, telecommunications was cited as the second-
most important factor (Graham & Marvin as cited in Kotval 1999). Cities such as New
York, Boston, and Amsterdam have earned a competitive advantage by establishing
teleportss" (satellite linkages connecting to local telecommunication networks), and they
effectively result in a globally networked city. For those who do not have access to the
technology, the ever-growing sophisticated infrastructure could create problems.
Universal access to information and communication technologies is critical in closing the
gap between the economically disadvantaged social groups and the advantaged groups.
Graham (1999) is already describing the unfavored zones within cities as "network
ghettos," places of low telecommunications access and concentrated social
disadvantages. "Uneven global interconnection via advanced telecommunications
becomes subtly combined with local disconnection in the production of urban space"
(Graham 1999). Those who could benefit most from the infrastructure as a tool to
enhance quality of life, job searches, education, and communication, may be those who
are excluded from the sophisticated technology.
Poor and less-advantaged cities that are reluctant to welcome Internet
infrastructure and telecommunication competition may be disadvantaging themselves.
Finnie (1998) studied 25 major cities, and determined that global cities that remain
competitive in attracting business firms lack strict regulation within the
telecommunication sector. According to Finnie, telecommunications services are
becoming increasingly central to business success or failure; as competition increases
and sophisticated technology becomes more widely available. As a result, the gap
between the haves and the have-nots could well be narrowing. Because of all the
benefits cities receive when Internet infrastructure is implemented, it is not feasible to
ban growth of these sophisticated networks within urban areas.
Kotkin (2000) has claimed that the digital era we are currently experiencing is a
period of advancement not seen since the industrial revolution. It is hard to argue this
statement. The dawn of the Net has impacted the economic and social geography of
America largely, and some believe it is redefining the American city hierarchy (Kellerman
2002, Kotkin 2000, Townsend 2001). Numerous titles have been devoted to the digital
revolution, the Internet, and an increasing interconnectedness of our world. Six Degrees
(Watts 2003) and Linked (Albert & Barabasi 2002a) discuss the overlap and
interconnection of networks in modern day life. Some texts, such as Information
Tectonics (Wilson & Corey 2000), The New Geography (Kotkin 2000), and Worlds of E-
Commerce (Leinbach & Brunn 2001) explore the economic, geographic, and social
implications of the digital revolution from the perspective of various disciplines.
There is growing literature on the Internet, it's history and composition, and the
effect of proliferating telecommunication networks on cities, complex regional networks.
This research helps to provide a perspective for examining the U.S. Internet backbone
network. This dissertation will contribute to that literature by analyzing the connective
properties of the U.S. Internet backbone network.
Data and Methodology
Internet backbone data have been obtained from George Mason University School
of Public Policy. The dataset was created by researchers at George Mason University in
2003. The Internet backbone dataset provides a measure of the amount of data capacity
and connections a consolidated metropolitan statistical area (C/MSA) has to move
information to another C/MSA. The data was calculated from the total long haul fiber
capacity, or bandwidth, connecting a C/MSA to other C/MSAs. Bandwidth is the term
used to describe transmission speed, which is measured in bits per second. According
to Malecki and Gorman (2001), "bandwidth is what makes communications- specifically
Internet Protocol (IP)-different from transport networks. The limiting factor of IP
networks is not distance, but the capacity of the bandwidth available on the network from
one location to another" (p. 90). The normal speed of a voice call is 64 kbps.
Transmission speeds above 64 kbps is generally categorized as broadband (Huston
1999, pp. 160-171. The Internet backbone providers are companies that own the
framework of the Internet. This network connects CMSA nodes and transports data
across long geographic distances. Multiplexing is the process of sending multiple signal
streams of information on a backbone at the same time in the form of a single signal. By
multiplexing, higher bandwidths can be achieved. In 2001, 48 private providers operated
the Internet backbone networks. These firms range from large telecommunication
carriers such as AT&T, MCI WorldCom, Sprint, IBM, and Cable & Wireless, to smaller,
lesser-known firms. The backbone providers are called autonomous systems (AS),
which means they operate independently from other systems, setting their own policies
and network structure (Malecki & Gorman 2001, pp. 92-93). These independent
networks interconnect to form a larger network, thus creating the Internet backbone. In
1995 Huitema suggested that the Internet backbone network is the best indicator of the
geography of the Internet. The amount of bandwidth between C/MSAs is not equal.
Chapter 3 analyzes the network as though the bandwidth connections are equal, while
Chapter 4 address the inequalities by adding weights to the analysis.
A geographic information system (GIS) was created with the Internet backbone
network data, using ArcGIS from Environmental Systems Research Institute (ESRI).
Several types of telecommunications data with different characteristics and capabilities
were also incorporated into the GIS. The GIS were statistically analyzed to determine
spatial relationships between the Internet backbone network and to understand the
distribution of Internet bandwidth. The telecommunication infrastructure were geo-coded,
giving each datum spatial attributes so it can be analyzed in conjunction with other types
of data in a common information system. Telecommunication infrastructure data
obtained for this research project include the following:
S Telephone Switches (Digital, Wireless)
Cellular Towers (Towers, Antennas)
S Fiber Lit Buildings
S Colocation Facilities
S Network Access Points (NAPs)/Metropolitan/Area Exchanges (MAE)
S Marine Cablelandings
S Fiber Points of Presence (Pops, termination points)
The following is a description of the data used in the GIS and statistical analysis
performed in Chapter 6:
Telephone switches. This database includes the location, capability, and
ownership of telephone switches. Description includes details such as which switches
are wireless, dq1gi1l, or integrated services digital network (ISDN). Customers might
purchase a single type of telephone switch they are interested in, such as wireless
switches. The data is geocoded and can be used in a geographic information system.
Cellular towers. A complete database of cellular towers in the United States
including the tower owner and capabilities of the structure. Data is also available on the
auction results of metropolitan areas. Auctions provide an economic valuation of regions
by private industry for the implementation of a technology. The auction values of a
region provide a new insight into how emerging technologies are affecting the urban
hierarchy of regions.
Fiber lit building. This is a very extensive database that includes numerous fiber
carriers. The description includes carriers, street addresses of termination point, type of
fiber, capacity and status of fiber, common language location (CLLI) codes, and
geocoded. This data is geo-coded and address matched, determining the building and
spatial attributes of the fiber location. This database includes dark fiber, lit fiber, fiber
currently in existence, fiber "in the pipe," as well as network expansion details of future
fiber locations. Fiber loops are also included in this database as are carrier "lit" buildings
and metro fiber routes. (Source Geo-tel 2002).
Internet interconnection facilitieslcolocation facilities. These are provision
network providers with floor space for their network equipment within a secure building.
The building is typically equipped with appropriate heating, ventilation and air-
conditioning (HVAC), enhanced fire suppression, electrical connections and diesel-
powered generators to guard against commercial power failures. Private companies
often choose to interconnect in these facilities, as well, to avoid costly local loop charges
and to have the ability to cross connect to their carrier of choice. Companies also
collocate to utilize multiple carriers so if their primary carriers' network fails, they can
reroute network traffic to the back up carrier already in place.
Network access points (NAP), metropolitan area exchanges (MAE), Internet
exchange (ix) facilities, carrier hotels (Geo-tel 2002). These are Internet
interconnection facilities on a grand scale. Some of these facilities are considered
"public" interconnection facilities, while others (MAE) are privately held. Many of these
mega facilities were original interconnection points for the early Internet.
Marine cablelandings. This data shows the exact location and termination of the
marine cableheads and lists the carriers located in the cable. A description of each
carrier's fiber in the marine pipe is also available. The spatial attributes for this data are
also included in the database (Source Geo-tel 2002).
Point-of-presence data (POP). This data includes the carrier of the POP and the
spatial attributes of the POP. This data set consists of carrier fiber points of presence
(POPs) that signify termination of fiber lines that provide connectivity to a location,
typically office buildings (Geo-tel 2002).
In addition to telecommunication infrastructure data, a database with descriptive
statistics of the metropolitan areas was compiled. The database included population,
bank deposits, income, and the local economy's dependence on specific sectors, such
as finance, insurance, and real estate (FIRE), as well as other factors that could be used
as interactive variables in the model. The descriptive data will be discussed more in-
depth in Chapter 6.
The data have been modeled in a geographic information system (GIS), using
ESRI's ArcGIS software to study the distribution of telecommunication infrastructure in
conjunction with like-kind as well as to study the hierarchy of nodes. Chapters 4 and 5 of
this dissertation use the long-haul fiber optic bandwidth data for the analysis performed
within. Chapter 4 uses the unweighted long-haul fiber optic bandwidth data for the
unweighted analysis. Chapter 5 adds weights to the analysis. Chapters 4 and 5 include
maps that display the long-haul fiber optic bandwidth data with other types of
infrastructure. However, the analysis for these chapters was performed solely using the
fiber network data. The analysis in Chapter 6 analyzes the Internet backbone data in
conjunction with various types of telecommunication infrastructure data as well. The
procedures performed in Chapter 6 also incorporate the descriptive data for C/MSAs into
The matrix-based frameworks for analyzing the overall impact of nodal or linkage
distribution on the overall connective properties of a network has not been applied to the
U.S. Internet backbone network. This research intends to identify the most critical links
and nodes in the domestic long-haul fiber network in the U.S. The research methods
contained within this chapter can then be applied to other types of telecommunication
networks to answer like-kind research questions.
Examining telecommunication networks as a graph has proven to be highly useful
in answering the proposed research questions and testing the hypotheses. This chapter
introduces and explains graph theory and matrix multiplication. The methods reviewed
here will provide a framework for network analysis and will be used to examine the
Internet backbone network in Chapters 4 and 5. The purpose of this chapter is to provide
a through explanation of graph theory and matrix multiplication that will be used in the
following chapters of this dissertation.
Introduction to Graph Theory
A fundamental question in network analysis is the degree to which the nodes are
interconnected. The connectivity of a network is defined by the overall degree of
connection between all vertices. The degree of connection between all vertices is
probably the most important structural property of the network (Taffe & Gauthier 1973,
p. 101). This section of methodologies will be based largely on Taaffe & Gauthier's 1973
publication, Geography of Transportation. Any network can be represented as a graph
(Haggett, Cliff & Frey 1977, Garrison 1960, Kansky 1963).As spatial structures, networks
are extremely complex in nature. This makes networks difficult both to describe and
analyze. By simplifying networks, we are able to study their characteristics. When
applying graph theory to a network in order to analyze it, it is necessary to model the
network in the form of a graph. As the network is simplified in analysis preparation, some
of the information about the network will be discarded purposely. Only those pieces of
information that are most relevant to analysis when using graph theory are taken into
account. Noting this, not all networks should be described in terms of graph or matrix
theory. Topological analysis takes into account only interconnection, excluding
properties such as shape, direction, and size. When the network is studied as a graph,
only the topological properties of the network are considered. The large range of
characteristics that might be identifiable with various networks are not analyzed in graph
Graph theory breaks the network into points and lines, in an abstract manner.
Although it does not model the real world directly, it provides measurement for some
structural properties of "a real-world system if that system is idealized as a set of points
connected by a set of lines" (Taaffe & Gauthier 1973, p. 101). In the simplest form,
networks can be represented by a series of vertices (representing nodes) and a series of
edges (representing links), with a relationship of incidence that associates each edge
with two vertices. We know only the presence or absence of connections between nodes
are given for each pair of nodes and represented in graph form. There are two ways to
measure the described network: (1) a single number (2) a vector of number. A single
number describes the aggregate geometrical pattern of the network, while the vector of
numbers measures the relationship of the individual components of the network to the
entire network (p. 101).
There is minimal information given about this network (see Figure 3-1, Hypothetical
Network A), so only primitive measures of connectivity can be assumed. The node-link
relationships are the only information given to derive conclusions about connectivity.
Figure 3-1. Network A
The first measurements that can be taken are the number of links and nodes. In
Figure 3-1, There are 8 nodes(v) and 7 links(e) in the network. Moreover, the network is
minimally connected as there is only one link between any two pairs of nodes. Note that
there are no redundant links within the network, meaning that no node has more than
one direct connection to any other node. Redundances occur when more than one link
connects the same two places. With any minimally connected network the number of
links is always one less than the number of nodes: e ,,,= (v-) = (8-1) = 7. Note that
removing any link in this network will disconnect the network into two parts.
Because network connectivity is most meaningful when a network is either
compared to another network or used in measuring growth, another hypothetical network
(B) is shown as an example for comparison (Figure 3-2).
Network B is more complex than Network A. Network B has 8 nodes and 11 links.
This network is more than minimally connected. Most of the nodes in this network are
connected to more than one node. When this type of structure exists, the removal of one
link will not necessarily disconnect the entire network.
Figure 3-2. Network B
In order to compare these two networks (A & B), connectivity measures must be
employed. Graph theory provides various simple measures. The most often employed
measures include the gamma and alpha indices.
The Gamma Index
The Gamma index is the ratio of the number of edges in a network to the maximum
uctualedge s e
number possible in that network: y = --cua
max edges e max
The number of links in examples A & B can be obtained from counting. There are 7
links in example A and 11 links in example B. The number of possible links (e ,,) can be
computed from the number of nodes in the system. If the network is represented as a
planar graph (one where intersections occur only at nodes), the addition of each node to
the system increases the maximum number of links by 3. This holds true for any planar'
network of more than two nodes. Because the graph is planar, the intersection of new
links is not possible without the addition of a new vertex. To express e ,, use 3(v-2).
The gamma index then becomes y = -
When using the gamma index to determine maximal connectivity, the relationship
between the number of nodes (t) in a networks and the maximum number of links (c), we
would use e = 3(t-2). The gamma index is expressed in terms of a graph-theoretic range
that varies from a set of node that have no interconnections, while on the other end of
'Planar networks form vertices whenever two edges cross, where non-planar
networks can have edges cross and not form vertices.
the spectrum we have a set of nodes in which every node has a link that connects it to
every other node in the network. The connectivity "is evaluated in terms of the degree to
which the network deviates from an unconnected graph and approximates a maximally
connected one" (Taaffe & Gauthier 1973) The gamma index falls between a range of 0
and 1. Using network A as an example, the gamma index would be
e 7 7
S- -- 7- = 39. In network B, the gamma index would be
emax 3(8-2) 18
r e 11 61. In terms of maximal connectivity, the first network is 39%
emax 3(8- 2) 18
connected while the second network is 61% connected.
When discussing minimal networks, the possibility of linkage removal was
discussed. This would sever the connectivity of the network into two separate pieces.
Linkages can also be added to a network, thus increasing the connectivity beyond the
minimal structure, adding redundancy and/or alternate paths. Additional linkages create
circuits. Circuits can be defined as a definite path where the original node of the linkage
sequence coincides with the terminal node. If a circuit is present, then it establishes
additional or alternate paths in the network. The number of linkages that are added to
the minimal network defines the number of alternative paths. The max number of
independent circuits in a network is also a function of the number of nodes in the
network and the number of linkages necessary for minimal connection between nodes.
The alpha index is a ratio measure of actual circuits, given by (I-n+1), to the
maximum number possible in a given network. In a connected network where links are e
and nodes are v, the number of links is equal to one less than the number of nodes
(e=v- 1), only when the network is connected minimally. When there is a circuit in the
network, the number of links is greater than the (v-1); e > v-1. By subtracting the number
of links that are needed for a minimally connected network (v-1) from the actual number
of nodes (v), we can obtain the number of circuits in the network. According to Taaffe
and Gauthier (1973), this can be expressed by e (v-l) = e v + 1. The resultant is a
measure of the number of independent circuits in the network. The maximum number of
independent circuits is also a function of the number of nodes in the network and the
number of linkages necessary to maintain minimal connectivity between nodes. For a
planar network, the maximum number of links is 3(v-2), thus the maximum number of
circuits would be: 3(v-2) (v-1) = 2v 5. The alpha index is a ratio measure of the
number of actual circuits (e-v+1), to the maximum number of possible in a given network
(2v-5): actual circuits e-v +1
max circuits 2v 5
The range of the index is from a value of 0 for a minimally connected network, to a
value of 1 for a maximally connected network. For the sake of convenience, the
numerical value may be expressed as a percentage of circuitry in a network. The alpha
values for network A is
actual circuits e-v +1 0
=----- = ----=- =
max circuits 2v-5 11
The alpha values for network B is
actual circuits e-v+1 4
max circuits 2v -5 11
The first network exhibits no circuitry In the second network, the maximum
possible number of circuits is 11, but there are only 4 circuits. The second network's
circuitry is 36% of the maximum.
It was mentioned previously that graph-theoretic indices of connectivity are also
useful for measuring network growth or change through time As an example, let us
consider that our example networks, A and B are an idealized sequence for transport
development. For the purpose of explanation, two more networks will be added: C and D
(Figures 3-3 & 3-4). Network C will represent the network prior to A (Figures 3-3).
Network D will represent the final network (Figure 3-4). Network C will have less
connections than either A or B, as D will have more connections than either A, B, or C
(Figures 3-1, 3-2, 3-3 & 3-4).
Figure 3-3. Network C Figure 3-4. Network D
In the beginning stage, which is illustrated by Network C, there are a few links
leading to interior centers. In the next state, illustrated by Network A, growth is evident.
The network has expanded and includes all of the region's nodes. The growth process
continues in Network B (Figure 3-2). Network D is an example of a mature network
(Figure 3-4) The number of nodes has remained constant dunng the network's growth,
but the connectivity of the network has changed. By using the alpha and gamma indices
we can determine to what degree the network connectivity has changed, as well as
identify the change in the network's spatial structure.
Figure 3-5. Stages of network development
If we arrange the indices in a table, we can see that as the network grows, the
connectivity index increases. As the network becomes more structurally complex, both
the gamma and alpha indices increase.
Table 3-1. Structural indices for sequence of network development
Stage 1, Network C .22
Stage 2, Network A 39 .0
Stage 3, Network B .61 .36
Stage 4, Network D 78 .63
Three basic network configurations are used to relate the gamma and alpha
indices to more specific network characteristics: spinal, grid, and delta. The spinal
network shares the characteristics of a minimally connected network, every node is
connected to at least one other node and traffic can flow between the nodes but by only
a single path. The number of links necessary for a minimally connected network is
always one less than the number of nodes in the network (v-1). We can conclude that
the gamma index 7 3(v 2), for a minimally connected network will be 7 = 3--
The alpha index a = 0, because there are no circuits in a minimal network.
The spinal network is illustrated as ('-I)-v+l 0 .
The delta network composition is a stark contrast to the spinal network. The delta
network is comprised of a high density of linkages in relation to the number of nodes
The delta network composition is one of numerous paths, sequences or links achieving
maximal connectivity. The shape pattern most dominant in the delta network is the
triangle, for each set of 3 nodes. When a node is added to a network, of more than three
nodes, two new links are required. The relationship will always remain 2v-3. Since this
relationship will remain constant, the gamma index will be Y -- 3 -v-2 The alpha
3(v 2) 3(v-2)
index e -v1 will always be a (2v-3)-v+1 v 2
2v-5 2v-5 2v-5
The third type of network ci:;r.igu aiior is the grid. The grid network represents the
transition network that is sandwiched between the spinal network and delta network. It is
a medium between the minimal and maximal network.
To categorize a transport network as spinal, delta, or grid, cutoff values must be
established. By determining a scale of alpha and gamma indices for each of the network
types, the ranges can be established for each category.
The largest and smallest gamma values are used to identify a spinal network.
Taaffe and Gauthier express the gamma index for a spinal network configuration as
v1 Alternate expressions given include: v- and 2Lj
3(v -2) 3 jv-2 2) v-
Taaffe and Gauthier suggest v for networks containing a large number of nodes
will approach 1 and that 1 will approach zero. In sum, this means that the expression
will approach 1/3 of (1-0). At the lower end, the value of the gamma index will be 1/2.
This means that spinal networks can be categorized between a range of values from 1/3
and 1/2, or .333-.5.
For the delta network configuration, housing maximal connectivity, the gamma
index is 2v-3 or 1 )(2v -3 Delta networks will have a range of values between 2/3
3(v-2) 3 v- 2
and 1.0. Table 3.2 shows the range of values for the three classical networks
Table 3-2. Range of values for the delta index for three classical network patterns
Spinal 1/3 y 1/2 where v > 4
Grid 1/2 < <2/3 v> 4
Delta 2/3 < < 1.0 3
Considering the network examples A-D. we can see how the network has changed
from a spinal network to a delta network. In the first stage of the network, the gamma
value is .22. This is expected, as the nodes are isolated, demonstrating minimal
connectivity. In the second stage the gamma value is higher, .39. The network is more
connected as the gamma value indicates. The third stage finds the connectivity value
even higher, at .61. The fourth and final stage of the network has a gamma value of .78.
By this stage, the network displays clearly the triangle configuration that is characteristic
of the delta network.
The alpha index can also be used to define network configuration. As discussed
previously, the absence of circuits means the alpha index value will be zero. Thus, the
alpha value for a spinal network will be zero. The alpha value range for grid or delta
networks depends on how many circuits exist. By defining limits of the alpha range it will
be possible to determine the network c.:- ';ural~.-, As defined by Taaffe and Gauthier,
the delta configuration for the alpha index is a 2v -3)-v +1 -2 The alpha index
2v 5 2v- 5
ranges for the three classical network patterns are presented in Table 3-3.
Table 3-3. Range of Values for the alpha index for three classical network patterns
Spinal C = 0 where v = e + 1
Grid 0 < <.50 v >3
The two indices, gamma and alpha, complement each other in network
measurement. As a network increases in spatial complexity. the change in indices will
be similar. Looking again at the example network, the second stage of the network
shows corresponds with the spinal network. This is consistent with the alpha range for
spinal networks, which is 0 for the network's second stage. The alpha value for the third
stage is .36, categorizing it as a grid network The gamma value for the third stage is
.61, also denoting a grid network. The fourth stage of the network both values fall under
the delta configuration. The alpha value is .63 and the gamma value is .78 for the fourth
stage. The gamma and alpha indices measure network connectivity and circuitry to
describe the network.
Five measures of graph-theory were introduced in the preceding text: number of
nodes, number of links, alpha index, gamma index, number of circuits. Diameter is a
sixth measure, which has not been defined. Diameter is a measure of the span of
transportation networks, defined as the minimum number of links that are required to
connect the farthest two nodes of a network. The diameter describes the minimum
number of links required to connect the most distant nodes in a network. Hence, a
diameter of five would indicate that there are at least five links separating any two nodes
in a network. For example, in network B (Figure 3-2) the diameter is 4. There are only
four links separating any two nodes in Figure 3-2. While these indices are useful
descriptive tools, it is necessary to remember that graph theory does not include many
complexities in practice. In short, graph theory simplifies a network and analyzes it's
By measuring the accessibility of a node we can determine the hierarchy or the
system of competition that may exist between the nodes in a given network. The addition
of linkages or the destruction or removal of nodes or linkages is to affect the entire
network. Changes of this type also reflect changes in the accessibility or hierarchy of
nodes. Graph theory is used to measure these changes as well as to determine if a
hierarchy or system of nodes exists as defined in terms of connectivity. Just as any
network can be represented as a graph, a matrix can also represent any network. By
representing a network as a matrix, numerous questions concerning the network's
accessibility can be answered.
Traditionally the origin nodes of a network are represented in the horizontal rows of
a matrix and the destination nodes are represented in the vertical columns. The number
of rows and columns in a matnx must be identical. Relationships between the nodes are
represented by corresponding cells. The points (nodes) of the graph are labeled and the
labels are used to identify both rows and columns of the matrix When two points in the
network are connected, this link is represented by placing a non-zero number (typically a
value of 1 in the case of a binary connectivity matrix at the intersection of the relevant
row and column If there is no connection, a zero is placed at the intersection. Figure
Table 3-4 represents a network as a graph and it's matrix format. The number of
nodes in the network illustrated is represented by both the number of columns, and the
number of rows in the matrix. The presence of non-zero number represents a link
between corresponding nodes and the absence of a direct link is represented by a zero.
Also, the connection of a node to itself has no value, so a zero is recorded in the
corresponding cells. For example, the cell at the intersection of row 2 and column 2
contains a zero. This matrix only gives information on the presence or absence of a
direct connection between nodes.
Table 3-4 Example of a binary connectivity matrix
Albuquerque 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 00
Atlanta 0 0 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0 1 0 1 1
Birmingham 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Boise 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
Buffalo 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
Charlotte 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 1
Cincinnati 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1
Cleveland 0 1 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 1
Chicagoouston 1 1 0 0 0 0 1 1 1 1 1 0 1 00 1 0 1
Cincinnati 0 0 0 0 0 01 0 1 0 0 0 0 0 1 0 1
Cleveland 0 1 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0 1
Houston 1 1 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0 1 0 1 1
KansasCity 0 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 0 1 1 0 1
Las Vegas 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0
LosAngeles 0 1 0 0 0 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1
New Orleans 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
NewYork 0 1 0 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 0 1
Oklahoma City 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
San Diego 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0
San Francisco 0 1 0 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 1 0 1
Seattle 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 0 0 1
Tampa 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1
Washington 0 1 0 0 0 1 1 1 1 1 1 0 1 0 1 0 0 1 1 1 0
Nodal accessibility can be derived directly from the binary connectivity matrix. This
would be the simplest form of measurement of accessibility of a node. By summing each
row of a matrix, the total value is the number of direct links between the given node and
other nodes. The higher the row total value, the more accessible the given node is to
other nodes in the network. Taaffe and Gauthier (1973) use an air transportation network
to demonstrate the accessibility of a node (p. 119). In their network, both New York and
Chicago have direct flights to each of the other cities in the network. This means that the
accessibility of New York and Chicago is greater than other cities in the network. The
long-haul fiber data acquired for this project will be used to demonstrate a similar
example. A random sample of twenty-one metropolitan areas is used.
Table 3-6. Network represented in matrix form
Nodes V, V, V, V4 V V6
V, 0 1 0 0 0 0
V, 1 0 0 0 1 0
V, 0 0 0 1 0 0
V4 0 0 1 0 1 0
V, 0 1 0 1 0 1
V, 0 0 0 0 1 0
The matrix gives information on the presence of links between each given city, for
example there is a direct link between Atlanta and Birmingham. There is a "1" at the
intersection of Atlanta and Birmingham to signify a link exists between the two cities.
The matrix also denotes the absence of direct connections as there is no direct link
between Boise and Albuquerque. A value of "0" appears at the intersection of Boise and
Albuquerque (Table 3-4). The matrix also tells us that New York is directly connected to
eleven other cities in the sample network.
Table 3-7 illustrates a hierarchy of the twenty-one cities, as shown in Table 3-4 in
matrix format, of long haul fiber direct links. Atlanta, Chicago, San Francisco, and
Washington, DC, are the most directly accessible nodes in the network measured. If we
added the number of direct links for the corresponding row of each of these cites, the
total is twelve. Based only on direct linkages, these cities rank at the top of the hierarchy
Table 3-7. Hierarchy of network cities
City Rank Number of links
Atlanta 1 12
Chicago 1 12
San Francisco 1 12
Washington 1 12
New York 5 11
Houston 6 10
Los Angeles 6 10
Kansas City 8 9
Cleveland 9 7
Seattle 9 7
Las Vegas 11 5
Cincinnati 12 4
New Orleans 12 4
San Diego 12 4
Tampa 12 4
Boise 16 3
Charlotte 16 3
Birmingham 18 2
Buffalo 18 2
Oklahoma City 18 2
Albuquerque 21 1
Much more can be derived from the matrix; however, it is important to note that this
measure does have limitations as it only reveals the presence or absence of links and
not their capacity to support flows. Though a node may have a high connectivity level
based on direct connections, it may be lower in the network hierarchy when indirect
connections are included in the measurement of accessibility.
Indirect connectivity measures can be counted by using matrix multiplication. This
is an element-by-element method that involves multiplying each row of a matrix by the
column of another matrix. The sum of the products of the element-by-element
multiplication is recorded in the corresponding cell of the new matrix. The first matrix is
multiplied by itself; the product of this method is then multiplied by the original matrix to
get the next product. The next product is then multiplied by the original matrix, and so on
and so forth.
Figure 3-6 illustrates the matrix multiplication of the matrix introduced in Table 3-4.
The matrix is first multiplied by itself (C-C) to produce C2. For each cell of C2 the value is
c": = c, .cir The two indirect links from node i to j are represented by
Cik k Ckj (Taaffe & Gauthier p. 122). The original graph is show in the top left corner
of the graphic. The resultant matrix, C2 is show in the bottom right corner of the graphic.
The presence or absence of two-link paths can be determined by the resultant matrix C2.
The two-link paths are represented by non-zero entries in the matrix. The presence of a
zero in the matrix indicates that neither a direct or indirect path of only two links exists.
Nodes V1 V2 V V, V, V
V 0 1 0 0 7 0
V, 1 0 0 0 1 0
v 000 1 0 0
v V 0 0 1 0
v VS 0 1- 0 1 0
V 0 0 0 1 0
SVi 1 0 0 0 (jU 0
1 0 0 0 1 0 V2 0 2 0 1 1 1
0 0 0 1 0 0 V, 0 0 1 0 1 0
0 0 1 0 1 0 V4 0 1 0 2 0 1
S 1 0 1 0 1 Vs 1 0 1 0 3 0
0 0 0 0 1 0 VA 0 1 0 1 0 1
Figure 3-6 Matrix Multiplication
The resultant matrix, C2, tells us which pairs of nodes have two-link paths
connecting them. For example, node 1 and 5 contain a two link path between them. If we
look at the original matrix (C), we find that there is no direct connection between these
two nodes. Node 3 and 4 contain a direct link between them, but not a two-link path.
Note that the most distant nodes in the network are 1 and 3.
If the new matrix (C2) is multiplied by the original matrix (C), the number of three
link paths will be identified in the product matrix. Figure 3-7shows the matrices used to
determine the three-link paths in the network, as well as the original network. The new
matrix introduced, C3, provides the connection by three-link paths between nodes. The
new matrix was produced by multiplying the rows of the original matrix (C) by the
columns of the second matrix (C2). More pairs of nodes in the network are connected by
third-link paths than are connected by either two-link paths or direct paths. By using this
same procedure of multiplication any number of link paths can be determined.
By adding the number of paths between nodes on the network, direct, two link, and
three link, the accessibility matrix or T, can be created. Figure 3-7 illustrates each matrix
added to create the accessibility matrix (T). The resulting accessibility matrix is shown in
Figure 3-7. By summing each row in the accessibility matrix, we are able to determine
the column vector, which is the accessibility of a given node in a network. Figure 3-7
illustrates the addition of rows and resultant vectors. The summation of elements:
SC, for k=1...d matrices. The summation of typical elements of c, of the matrices up
to the power d will yield the typical elements of the accessibility matrix (A) with typical
elements a, (see Figure 3-7). A diameter of four represents the number of links between
the two most distant nodes; 1 and 3 are separated by four links.
If the were then ranked based on the accessibility vectors, node 5 would be the
most accessible, as it has the highest vector. By summing the rows across the
accessibility matrix, row 5 produced the highest value. This means that node five is the
most connected and accessible node in the network. Nodes 4 and 2 have the second
highest value, followed by node 6. Finally, the least two accessible nodes in the network
would be 1 and 3; their row totals produced the lowest values.
Though it's possible to continue T.,uiC.inl.r. the matrices, the network cannot have
more than v-1 paths without establishing redundant paths. Multiplication is usually
V, 0 ",
Nodes VI V2 V3 V4- V5 V6
VI 0 1 0 0 0
V2 1 0 0 0 1 0
V3 0 0 0 1 0 0
V4 0 0 1 0 1 0
V'5 0 1 0 1 0 1
V6 0 0 0 0 1 0
Nodes VI V2 33 V4 V5 V6
VI 1 0 0 0 1 0
V2 0 2 0 1 0 1
V3 0 0 1 0 1 0
V4 0 1 0 2 0 1
VSN 0 1 0 1 0
V6 0 1 0 1 0 1
Nodes VI V2 V3 V4 V5 V6
VI 0 2 0 I 0 1
% 0 00 0
V2 2 0 1 0 4 0
V3 0 1 0 2 0 1
V4 I 0 2 0 4 0
V5 0 4 0 4 0 3
V6 1 0 0 3 0
Nodes VI V2 V3 V4 V5 V6
VI 2 0 1 0 4 0 Z = 7
V2 0 6. 5 0 4 Z = 15
V3 1 0 20 4 01 2 = 7
V4 0 5 0 0 4 2 = 15
V5 4 4 0 11 0 Z = 19
V6 0 4 0 4 0 3 = 11
Network Connectivity = 74
Accessibility Matrix (A)
Nodes V1 V2 V3 V4 V5 V6
VI 3 3 1 1 5 1 E = 14
V2 3 8 1 6 5 I = 28
V3 1 3 3 5 1 = 14
V4 1 6 3 8 5 5 E 28
V5 5 5 5 5 14 4 = 38
V6 1 5 I 5 4 4 = 20
Network Accessibility = 142
Figure 3-7. Three-linkage paths and the accessibility matrix
Note: Accessibility matrix (A) is a composite of C, C2, C3, and C4, based on the sum of
the typical elements of these matrices.
stopped when the network has reached its diameter, because each of the nodes in the
network is connected to each of the other nodes in the network. The matrix is powered
to it's diameter, in this case d=4, to ensure that the summation of for all matrices up to
that power will yield non-zero entries in the resultant matrix. The greatest number of links
between any two nodes is four. This does not mean that there will be no redundancies in
the network. Powering up to the diameter ensures that, all of the nodes are connected to
each other node in the network by some path, be it direct or indirect (via multiple links or
These techniques first appeared in a study on the Interstate Highway System in
the United States (Garrison 1960). Garrison's study is noted by Taaffe and Gauthier in
their 1973 publication the Geography of Transportation. Garrison, as well as Taaffe and
Gauthier raise concern over the redundancies that arise using matrix multiplication, as
they many not be any representation of distance minimizing behavior of networks. In
response to these concerns, a shortest-path matrix multiplication procedure was
discussed. This procedure was introduced by Alfonso Shimbel in 1953. Shimbel's
procedure computes accessibility in terms of the distance between nodes, thereby
eliminating redundancies. The procedure calculates the length of the shortest path
between nodes Taaffe and Gauthier (1973) suggest that for many real-world problems
redundancies are of no importance (p. 132). For this particular research however, the
importance of redundancy in long-haul fiber networks is pertinent. The majority of the
links in this network are redundant, which must be considered in the practical application
of link removals due to disturbance. These redundancies increase the insurance of a
network's functionality should it experience a disturbance. They allow also allow for
increased traffic and data transfer during normal conditions.
To review, the connectivity of a node is determined by summing across the rows
of the resultant matrix, and by summing the rows of the resultant matrix, the connectivity
of the entire network can be determined (see the, Matrix C4 Network Connectivity in
Figure 3-7). The measures of accessibility and connectivity give nearly identical
measures in nodal/link importance to a network (compare in Figure 3-7). Both
accessibility and connectivity measures have been demonstrated in this chapter.
However, the following analysis concentrates on the connectivity of nodes in order to
determine which nodes are best connected and most important to the network rather
than which nodes are most accessible.
Weighted or Valued Graphs and Shortest-Path
Networks can be represented as "valued graphs." This adds weights to the
network's links, highlighting their capacity or flow potential, or distance. If we take the
original graph and assign weights to the links we have a weighted network. Figure 3-8
illustrates the network as a weighted graph, with values assigned to each link. The new
weighted network is represented in a matrix with the new weighted values in Table 3-8.
The values added to the weighted network represent distance between the
corresponding pairs of nodes. In Table 3-8, the length (or value) of existing direct
connections between pairs of nodes is represented in the corresponding matrix cells.
For those pairs of nodes that do not have direct connections between them, a = is
recorded in the cell, representing infinity. Connection between a node and itself is
meaningless, so those cells have a zero recorded in the cell. These self-connection cells
always fall along the main diagonal of the matrix. We can tell from the matrix the
distance between nodes in direct connections, and also indirect connections. To travel
from node 1 to node 2 a distance of 20 is traveled, accomplished in a direct connection.
To travel from node 3 to 5 however, an indirect connection involving node 4 is required.
The distance from node 3 to 4 is 20, and the distance from node 4 to 5 is 10. The
total distance from node 3 to 5 is then 30.
V 1V0 v,
Figure 3-8. Weighted network
Table 3-8. Weighted network represented as a matrix
Nodes V, V; V3 V4 Vs V,
V, 0 20
V2 20 0 30
V, oo 0 20 >
V4, 20 0 10 m
V5 30 = 10 0 5
V, 5 5 0
In order to answer questions of accessibility questions about more complex
matrices, a procedure similar to matrix multiplication is required. The new procedure is
less complex than matrix multiplication. Element-by-element addition (x-y=x+y)
replaces element-by-element multiplication, and instead of summing the values the
minimum value is inserted in the appropriate cell of the new matrix [x + y = min(x y)]. The
value ij becomes the minimum value of the sums of these two-stage links from origin ito
k and then to destination j, or c,2 = c,- cA = min(c,, + c,). The previous equation for
summing the products in matrix multiplication was : C,' = c,L c, Figure 3-1
illustrates the element-by-element addition and determination of the minimum sum. It
also indicates where the new value would be inserted in the new matrix, L2. In the
example shown in Figure 3-9, the least-path linkage between nodes 1 and 5 are
The two-step path that connects node 1 and 5 has a length of 50. By using this
methodology, the successive powers of the weighted matrix are calculated. The results
indicate the minimum distance required between each pair of nodes. When a value of m
has been reached, a matrix of minimum distance has been achieved. Once the matrix
contains no non-zero entries, with the exception of the main diagonal, minimum distance
has been achieved between each and every pair of nodes in the network (Taaffe &
Gauthier, p. 141). This method has been compared to Shimbel's minimum connectivity
procedure, which includes powering the matrix until there are no zero connections left in
the matrix (Shimbel 1953, Taffe & Gauthier 1973). Taffe and Gauthier compared the
addition method with Shimbel's method and found that the structural relationships don't
change but the distance criteria might give a more refined measurement of nodal
2 i v.
2 0' 1'
Node. VI V, V, V, V, V Node V, Y, V, V4 V, V.
v; 200o jo v : 30 '4 0
V, 0 2 I 0 0 20 K
V a O YJ V: A 0O inu, 3Ua
v, 2 20 0 10 V 20 a 1l
V, .0 SO 10 0 "V 3' 10 0
V, 0 V. ,
(V1_ V,) +(V_ V+ =7 0 +" +.
(V,_ V.: + (Vi V 20 + 30 = 50
(Vi- V) +(V3_ V0) 0- m +
(V1_ V) + (V_ '4,: = + 10 = o
(Vi- V) +(V3_ V) + 0 =
(V_ V+,_ V) ) =+ V, + 5 = ,o
Nodes \V V, V, V, V V,
Figure 3-9. Indirect Connections in a Weighted Network
Node and Link Removal
To determine the degree of importance of any node to a network, it can be
removed from the network (Haggett, Cliff, & Frey 1977b, p. 322). The remaining nodes in
the network are then recalculated to determine changes in the network; diameter (r), row
totals (R,), network connectivity; C R ,and disconnects. The matrix is repowered after
a node or pair of nodes has been removed. The results are compared to the original
network values. When a network has been powered to it's diameter (c,, 0), the row
sums of the resultant matrix indicate the connectivity of each node in the network.
By summing the row totals, the connectivity of the entire network can be derived.
The network connectivity index (NCI) is obtained by summing the powered matrix's row
totals; R, Nodes which cause the largest changes in network connectivity, to i and
j values, will exercise the greatest degree of importance within the network (Haggett,
Cliff, & Frey 1977b)
The method is demonstrated using the network shown in Figure 3-10. The network
is comprised of six nodes and seven links. The network was powered to a diameter of
three, and the resultant matrix is shown (Figure 3-10). The NCI of the complete network
is 34525000 (the sum of row totals). Figure 3-11 illustrates the impact of node removal
upon a network. Node V3 was removed from the network. This removal reduces the
network to five nodes and five links (Figure 3-11). This network was powered to a
diameter of three. The sum of rows of the resultant matrix was less than that of the
complete network, as expected. With the removal of Node V3, the connectivity of the
network dropped to 14625000. This is a 57.6% decrease in the NCI value. Figure 3-12
illustrates the removal of a link from a network. The link connecting node V4 and node
V5 was removed from the network. The remaining network was powered to a diameter
of three. The row totals of the resultant matrix were summed to determine the
connectivity of the network with absence of this link. The connectivity drops to
24205000, decreasing the connectivity of the network 30%. The NCI % change is
relatively high for the examples illustrated in Figures 3-11 and 3-12, however the
network contains a small number of both links and nodes. The node removal reduced
the number of nodes in the network 16.6% (from 6 nodes to 5),while the link removal
reduced the number of connections in the network 14% (from 7 links to 6).
The methods discussed in this chapter can be used to assess changes in
connectivity and accessibility, at node and network levels, in response to the removal of
a link or node. This chapter has discussed data methodology and analysis that are used
in the following chapters. The U.S. Internet backbone network is analyzed in Chapters 4
and 5. Node removal and matrix multiplication have been introduced in this chapter, and
they are they main tools used in the analysis for both Chapter 4 and 5. Network changes
will be measured for each removal scenario by comparing measurements in diameter,
row totals, NCI, and ranking, to the complete network. Chapter 4 describes the
unweighted analysis of the U.S. Internet backbone network. Single and pair node
removal scenarios performed to determine the effects upon the network. The analysis
performed in Chapter 5 also uses matrix multiplication and node and link removal
techniques, adding weights to the network. A better representation of the actual Internet
backbone network is achieved by using valued links.
Noridp VI V2 V3 V4 V5 V6 \ 30
Vi : f n" w i :, )
V2 U II C Co CO
V3 m ;o 00 ', o 20">
V D '.0) W, CO 2 10 'V v1
V6 o 0o ( co 20 5
Nodes 'VI V2 V3 V4 V5 V6 Row' otals
VVI 2023125 300000 1082500 140000 2182500 50000 Z 6078125
V2 300000 2'12500 1435000 2067500 440000 298750 L 253750
V3 1082500 1435000 1340000 950000 950000 182500 P = 5940000
V4 140000 2067500 950000 2108125 1060000 197500 Z 6823125
V5 2182500 440000 9500(X 1060000 2953125 40000 71562562
V6 50000 298750 182500 197500 40000 35625 = 804375
Network Connectivity Index (NCI) L 34525000
Figure 3-10 Example of a complete network
Nodes V1 V2 V4 V5 V6 \
VI 0 20 25 co co 25
V2 20 0 Mo 30 (o 30
V4 25 o 0 10 co
V5 co 30 10 0 5
V6 co co co 5 0 1 \
Nodes V1 V2 V4 V5 V6
V1 1773125 0 0 1742500 0 L = 3515625
V2 0 2352500 1627500 0 238750 L = 4218750
V4 0 1627500 1168125 0 157500 E = 2953125
V5 1742500 0 0 1773125 0 E = 3515625
V6 0 238750 157500 0 25625 E = 421875
Network Connectivity Index (NCI) E = 146250000
Figure 3-11 Example of a Node removed from a network
VI V2 V3 V4 V5 V6
VI 0 2i co 25 co cc
V2 2fl I co co to
V3 cc Co C0 I-1 2 Co
V4 S co 20 to to
V5 co M :0 0o ( '
V6 0 tCo to to f 1'
VI V2 V3 V4 V5 V6
240000 1410000 50000 E = 15'3125
1162500 200000 258750 X 5518750
300000 300000 172500 L = 4220000
1460625 940000 75000 = 41"8125
940000 2275625 0 = 5125625
75000 0 33125 589375
Network Connectivity Index (NCI) E = 24205000
Figure 3-12 Example of a link removed from a network
THE U.S. INTERNET BACKBONE NETWORK:
AN UNWEIGHTED ANALYSIS
Protection of critical infrastructure in the United States has been a hot topic in
recent months. One of the most pressing issues is the insurance and protection of
critical infrastructure. The protection of infrastructures that directly effect the economy
and financial sector is needed. Major disturbances to the Internet's infrastructure have
occurred, which directly effects, among other things, the financial market. The fall of the
Twin Towers during the terrorist attacks of 9/11 and the northeastern power outages of
August 2003 have each led to Internet disruption due to physical failure.
It must be noted that disturbances to physical Internet infrastructure occur
frequently on small scales. These disturbances are usually accidental, often caused by
backhoes and shovels. These disturbances are less publicized, in part because the
disruption is minor and generally effects local service.
September 11'", 2001 was the largest loss of physical telecommunication
infrastructure (FCC 2001). Verizon alone lost their central office' along with 182,000
voice circuits, more than 1.6 million data circuits, and more than 11,000 lines serving
Internet service providers (GAO 2003). With the loss of Verizon's central office, 34,000
businesses lost their telecommunication service. Verizon is an example of just one
telecommunication provider that was dramatically effected by the terrorist attacks, many
other telecommunication providers lost physical infrastructure. Though the exchange
and clearing organizations were undamaged by the terrorist attacks, the economic
disruption was severe due to the loss of telecommunication infrastructure (GAO 2003).
'A central office is a facility, owned & operated by a telecommunication firm, which
houses the switching equipment that links customers to voice and data networks within
and outside the service area.
A power blackout on August 14, 2003, simultaneously affected Detroit, Cleveland.
Columbus and Long Island, New York as well as Canadian cities, Toronto, and Ottowa
(Rosenblum 2003, Semple 2003, NASA 2003). Figures 4-1 and 4-2, courtesy of Chris
Elvidge of the U.S. Air Force, have been made available by the NASA Earth Observatory
in their collection of unique images. Figures 4-1 and 4-2 illustrate the widespread power
outage. The top image was taken on August 14, 2003, roughly twenty hours before the
blackout occurred. The lower image was taken on August 15, 2003, about seven hours
after the blackout began: a post-September 11th reminder to the vulnerability of our
infrastructure and the interconnectedness of our cities. A series of events caused a
domino effect across interconnected power grids. The result was a widespread outage
(NASA 2003). The cities affected by the blackout experienced varying degrees of impact
upon themselves ar,d tirer rae..qrt.:.r. The same type of domino effect has occurred
within the nation's telecommunication infrastructure network. Telecommunication and
computer equipment is often dependent upon the power grid, though there are
exceptions of power generation to avoid the power grid dependency panrt:ularl, in data
and colocation centers [interconnection hubs for networks]).
Sixty Hudson Street, located in the financial district in Manhattan, is one of the
main hubs of Internet activity and interconnection in the world. Unlike other data and
interconnection facilities, this building is reliant upon the power grid and weathered
major disturbance during the power outages in August of 2003. With little time bought by
generators, many companies that housed equipment in the building were soon
negatively affected (Careless 2003). Figure 4-1 further illustrates the interdependencies
between Internet infrastructure and the power grid. Figure-4-3 shows a map of Internet
routing outages that occurred during the August 14, 2003 power outages. The
interdependency of networks increases their vulnerability to disturbances, as illustrated
by the terrorist attacks of 9-11, the power outages and other disturbances.
Figure 4-1. Image taken August 14, 2003, 9:29 p.m. EDT, about 20 hours before
blackout (NASA Earth Observatory, 2003)
Figure 4-2. Image taken August 15, 2003, 9:14 p.m. EDT, about 7 hours after blackout
(NASA Earth Observatory 2003)
Figure 4-3. Internet routing outages during the 2003 blackout. Source: www.renesy.com
This Chapter focuses on long-haul fiber, where, like the power grids across the
U.S., infinite overlaps and interconnection occur to create one large network. The
disturbance of each node holds the potential to disrupt the overall network. The
important a node is to the overall network, the more effect the disturbance will have upon
the connectivity of the network. A disruption in the network can also have major effect
on sectors reliant upon that network. Economic sectors are increasingly dependent upon
telecommunication infrastructure, particularly long-haul fiber. Should particular long-haul
fiber routes endure a major disturbance, the economy would be directly affected.
Detecting vulnerabilities is the beginning of protection.
This chapter introduces a methodology to identify the most critical links and nodes
in the Internet Backbone Network U.S. The analysis will proceed in two stages. First, an
unweighted analysis will consider each link to be of equal importance within the network.
In short, the unweighted analysis will not take into account the amount of bandwidth that
connects a node to other nodes. Weights will be added in Chapter 5, recognizing the
amount of bandwidth each link represents. In this chapter, a method for identifying a
hierarchy of nodes and links within an undirected, unweighted network is introduced.
The method applies an unweighted scenario to the Internet Backbone Network in the
United States using the graph theoretic concepts discussed in Chapter 3; this chapter
will assess the vulnerability of the long-haul fiber network in the U.S. on several different
* How node removal/disruption affects nodality (and ranking) of all other nodes and
changes the ranking of all remaining links in the network
S How node removal/disruption affects the overall connectivity of the network as
measured by the sum total of all nodality indices after removal/disruption;
* How link removal/disruption affects the nodality (and ranking) of nodes
* How link removal/disruption affects the overall connectivity of the network as
measured by the sum total of all nodality indices
S Changes in the rank and importance of remaining links.
A total of 218 nodes representing cities in the U.S. (including Alaska and Hawaii)
were used for the unweighted and weighted analysis (Figure 4-4). Table 4-1 shows a
complete list of the C/MSAs used in the analysis. Basic network measurements were
performed using the Internet Backbone Network data to determine network connectivity.
A symmetrical binary connectivity matrix is used to represent the network. The number
of binary links represented in the matrix (1042) were divided in half (521), to identify the
total number of unique links in the network (to avoid double counts). Figure 4-5
illustrates the network links used in both the unweighted and weighted analysis. This
partially addressed the redundancy that occurs with matrix representation of a network.
The self-to-self links that are represented in the principal diagonal will not be included in
the totals. As it is assumed that nodes are not connected to themselves. The gamma
and alpha i actual edges e e 2v-3 and
and alpha indices: y =- and
max edges e max 3(v- 2) 3(v-2)
actualcircuits e vl 2
maxcircuits 2v5 2
max circuits 2v 5 2v- 5
i 6t /
0 800 Miles
* Internet Backbone Node
Figure 4-4. Nodes of US. Internet backbone network
Note. Honolulu, HI & Anchorage, AK are not included in this map, but are Internet Backbone Nodes used in this analysis.
0 900 Miles
Figure 4-5. Links of U.S Internet backbone network
' Internet Backbone Link
Table 4-1. List of consolidated metropolitan statistical areas (CMSA) and metropolitan
statistical areas (MSA) in U.S. Internet backbone network
Ardmore Colorado Springs
Austin-San Marcos Columbus
Baton Rouge Corpus Christi
Bellevue Dallas-Fort Worth
Birdstown Daytona Beach
Blacksburg Des Moines
Bohemia Detroit-Ann Arbor-Flint
Boise Devils Lake
Bowling Green Dunlap
Brownsville-Harlingen-San Benito Eau Claire
Bryan-College Station El Paso
Buffalo-Niagara Falls Elkhart-Goshen
Cedar Knolls Erie
Celina Estill Springs
Chapel Hill Eugene-Springfield
Charleston-North Charleston Fargo-Moorhead
Charlotte-Gastonia-Rock Hill Flat Creek
Florence Kansas City
Fort Myers-Cape Coral Lafayette
Gainesboro Las Vegas
Garden City Lexington-Fayette
Glenview Little Rock
Grand Forks Livingston
Grand Rapids-Muskegon-Hollan Longview-Marshall
Green Bay Los Angeles-Riverside-Orange County
Table 4-1. Continued
Iienrt.or,:..iJriOn n Salerr,.Higlh PFoin
New York-Northern New Jersey-Long
Norfolk-Virginia Beach-Newport News
Salt Lake City-Ogden
San Francisco-Oakland-San Jose
San Luis Obispo-Atascadero-Paso Robles
Santa Barbara-Santa Mria-Lompoc
Table 4-1. Continued
Richmond-Petersburg Tracy City
Roanoke Rapids Victoria
Rochelle Park Waco
Rocky Mount Washington-Baltimore
Rolling Meadows Waterford
West Palm Beach-Boca Raton
The gamma and alpha indices were computed as standard measures of network
connectivity. The long-haul fiber network contains 218 nodes or vertices and 521 links or
edges. The gamma index is simply the ratio of the number of nodes in a network to the
maximum number possible in that network: y 3(v- 2) 648 Hence, the gamma index for
the Internet backbone network in the U.S. is 0.804. In terms of maximal connectivity, this
means that the network is 80.4% connected.
The alpha index, is a ratio measure of the number of actual links to the maximum
number possible in the network: actualcircuits e-v+1 521-218+1 304 .705
max circuits 2v-5 2(218)-5 431
Like the gamma index, the alpha index ranges from 0-1. A zero value would represent a
network that is minimally connected, maximally connected networks would be
represented by 1. The network linkage, or circuitry, is 70.5 % of the maximum.
Table 4-2 shows binary data for the Internet backbone network in the U.S. for
2003. The binary totals represent the number of nodes to which a node is directly linked.
Chicago-Gary-Kenosha has 36 binary links, more than any other metropolitan area. This
means that Chicago was directly connected to 36 other cities within the U.S. Internet
backbone network. New York and Dallas followed closely with 33 and 30 links
respectively. The 2003 data contains data for 218 metropolitan areas in the U.S. As
shown in Table 4-2 Washington-Baltimore has 28 separate long-haul links to
metropolitan areas, ranking fourth. Atlanta and San Francisco-Oakland-San Jose have
27 and 26 binary links, respectively, filling the fifth and sixth ranks. These cities have the
most binary connections to other major metropolitan areas in the U.S.
Table 4-2. Long-haul fiber optic binary connections in U.S. metropolitan areas
Metropolitan Area (C/MSA) Binary Connections 2003 (Rank)
Chicago-Gary-Kenosha 36 (1)
New York-Northern New Jersey-Long Island 33 (2)
Dallas-Fort Worth 30 (3)
Washington-Baltimore 28 (4)
San Francisco-Oakland-San Jose 26 (6)
Denver-Boulder-Greeley 19 (7)
Los Angeles-Riverside-Orange County 19 (7)
Kansas City 18(9)
Houston-Galveston-Brazoria 15 (12)
Miami-Fort Lauderdale 14 (13)
St. Louis 14 (13)
Tampa-St.Petersburg-Clearwater 13 (15)
Matrix multiplication was the main tool used in the unweighted analysis. As
explained and demonstrated in Chapter 3, matrix multiplication can help to determine
nodal accessibility as well as to establish a ranking of nodes and links within a network.
The connectivity matrix is a model of the network, allowing various scenarios to be
simulated in order to learn more about the network. By using matrix multiplication, the
most critical nodes to the overall network were determined. Various link-removal
scenarios were carried out using matrix multiplication. Both single-node and pairs-of-
nodes removal scenarios were performed. Once the city ranking for the complete
network was established, the nodes to be used in the single-and-paired-node-removal
scenarios were identified. Each of the nodes that were ranked in the top 12 in terms of
connectivity for the original, fully connected network were used in the single-node
removal scenarios. Every possible pairing of the top 12 nodes was also removed from
the network, accounting for 72 double-node removal scenarios. The new network
measurements show the degree of a node's importance to the network, and the effect of
different removal scenarios upon the entire network: diameter, disconnects, the
Unweighted Relative Connectivity Index (URCI), the Network Connectivity Index (NCI)
and the percentage change in connectivity.
For each removal scenario, the binary connectivity matrix was multiplied to its
diameter. This means multiplying the entire matrix until no zeros exist within the matrix.
In matrix multiplication, a zero represents a disconnect. The absence of zeros illustrates
each node in the network is connected to each of the other nodes in the network. When
a network reaches its diameter, all of the nodes in the network are connected. The
complete matrix was multiplied to its diameter, 11. This means the network was
multiplied 10 times before each node in the network was connected to each of the other
nodes in the network through some path.
Various removal scenarios completely disconnected particular nodes from the
entire network. These are called "disconnects." A disconnected node is severed from the
matrix and connection will not occur by continued powering of the matrix. This was
considered when determining the presence or absence of zero's within the connected
nodes of the network.
After the matrix was powered to its diameter, the rows from the product of the last
powering were totaled. These totals were used to determine which 25 nodes were at the
top of the hierarchy (Table 4-3). Figure 4-6 illustrates the hierarchy of the 218 nodes in
the U.S. based on the unweighted relative connectivity index. An index was then created
from the row totals. Each row total was divided by the minimum row total of the top 25
nodes in the network. The minimum observation was Salt Lake City, with a row total of
538479367663. This index is an unweighted relative connectivity index (URCI). The
value for Salt Lake City-Ogden was reassigned to 1.0 to begin the index.
URCI = i =1..N, where N=25. For example, to calculate the URCI values for Atlanta
(row totals= 1544357876452) within a fully connected network,
URCI, = 1544357876452 2 868. Figure 4-7 shows the geographical distribution of
the top 25 nodes based on the URCI.
The network connectivity index (NCI) was obtained by summing the row totals for
the entire network. The NCI was created after the row totals for each multiplied matrix
scenario had been converted to the URCI. The converted values were summed to give a
value representing the overall connectivity for the entire network for each removal
scenario. The simple NCI model is: Xota, = X, where i = 1...218. The NCI of the fully
connected network is 99.499. This means that the summed row totals, after being
converted using the URCI, had a value of 99.499.
The percentage change value is used to illustrate the amount of change in network
connectivity from each network scenario compared to the original, fully connected
network. The change was calculated by subtracting the NCI (of a given scenario) from
-re .:.r;nral connectivity value. The result was then divided by the original value, 99.499.
The equation is % =(x, x2) For example, to calculate the percentage change in
the connectivity of the network with Chicago-Gary-Kenosha (connectivity 36.909)
removed in comparison to the original network the equation is
(99.499- 36.909) 0.692 The final result gives the percentage of change in
network connectivity compared to the fully connected network.
Node URCI N *
r United States o soo Miles
Figure 4-6. Cities of U.S. Internet backbone network based on URCI, using graduated symbols to represent connectivity importance.
i., ", T_ -
o 1 1.494 .
/ -i ^-i
5 2.976 3.47
Q 2.482-2.976 6
United States 5soo Mile
Figure 4-7. Top 25 nodes in U.S. Internet backbone network, using graduated symbols based on URCI bullets.
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E9YI70BPF_CV36C0 INGEST_TIME 2014-04-18T22:30:15Z PACKAGE AA00014267_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC