A network and security analysis of the U.S. internet backbone network


Material Information

A network and security analysis of the U.S. internet backbone network
Physical Description:
v, 225 leaves : ill. ; 29 cm.
McIntee, Angela, 1978-
Publication Date:


Subjects / Keywords:
Geography thesis, Ph. D   ( lcsh )
Dissertations, Academic -- Geography -- UF   ( lcsh )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )


Thesis (Ph. D.)--University of Florida, 2004.
Includes bibliographical references.
Statement of Responsibility:
by Angela McIntee.
General Note:
General Note:

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 022440119
System ID:

This item is only available as the following downloads:

Full Text








ABSTRACT. ............ ............................... iv


1 PUTTING THINGS INTO PERSPECTIVE ............ ............ 1

Introduction ........... .......... ......... ........ .... 1
Statement of the Problem and Hypotheses ................... .... ... 3
The Internet ................... ......... 4

2 NETWORK CONCEPTS: A LITERATURE REVIEW ...................... 16

Network Concepts and Constructs ................... ............. 16
Network Analysis .............. ............................... 23
Geographic Literature of Networks & Information ......................... 33
History of the Internet and Parallel Developments in Telecommunications ...... 42
Telecommunications and the City ............. ................. .... 45
Data and Methodology ........ .......................... ..... 50

3 NETWORK ANALYSIS ............ ............................. 55

Analysis ............... ....................... .... ... 55
Alpha Index ................ ................. .................. 59
Connectivity Matrix .................................................. 65
Weighted or Valued Graphs and Shortest-Path ................. .. ..... 73
Conclusion ................ ......... 77


Complete Matrix .. ..... .. .. ..... ... ...... 96
Node-Rem oval Scenarios ........................................... 102
Pair Removals ................ ................. ... .. ..... 124
Sum m ary ................................... ........ .. ..... .. 127


Bandwidth: A n O verview ........................................ 132
Fully Connected W eighted Network ............................ 138
Conclusion ................... ... ... ............. .. .. .... 171

BACKBONE NETWORK ANALYSIS .............................. 179

Regression Analysis .................................. ......... 179
Kriging ...................... ............ ..... .. ....... 187
Surface Results ............. ............................... 190
Expanded Regression Analysis ......... ................... ...... 192
Important Nodes in the Internet Backbone Network ............... .. ... 206
Summary and Conclusions ..... ................... .......... ... 208
Directions for Future Research ..................................... 213

REFERENCES ............................. .......... ..... ...... 216

BIOGRAPHICAL SKETCH ....................... ..... .... .. 226

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Doctor of Philosophy



Angela Mclntee

May 2004

Chairperson: Timothy J. Fik
Major Department: Geography

Currently, as both the public and private sectors have become increasingly reliant

on Internet-related infrastructure, it is essential that the most valuable components of the

telecommunications system be identified and protected from disruptions and sabotage,

to ensure the proper functioning of the nation's economy and communications networks.

Any disruption that might lead to the loss of a network component could have

devastating consequences for both the overall network and the economy at large. In

light of these concerns, our study focuses on U.S. telecommunication networks and the

Internet backbone. Our study highlights the importance of spatial variability and

discusses the potential susceptibility of network components. More specifically, our

study outlines a methodology for identifying critical nodes and links in the U.S. Internet

backbone network. This type of analysis will aid policy-makers in the allocation of

resources when determining which infrastructures are most important to protect or

duplicate, to minimize the threat of a disruption. Identifying critical components is

essential for prioritization schemes that may be developed to add redundancy and

circuits to the network, to ensure its proper functioning in the event of an attack or

disaster. A graph theoretic approach is used to define and rank nodes and links and to

measure their importance to the overall network using both weighted and unweighted

scenarios. Implications of various node- and linkage-removal scenarios are also

discussed. Empirical results suggests clustering of telecommunication infrastructure and

bandwidth within large metropolitan locations, with regional variations in connectivity that

are not simply a matter of population size. Understanding the Internet as a "network of

networks" will aid in protecting and preserving the network, lessen component

susceptibility to disruptions, and enhance its overall efficiency.



On January 25, 2003, the Sapphire worm was launched by malicious hackers, and

reached global diffusion in 10 minutes. In the process, several bank automated teller

machine (ATM) networks failed, airline reservation systems ground to a halt, and the

entire Internet suffered a global slow-down. Results of the Sapphire worm show not only

the interdependency of telecommunication networks, but remind us of their vulnerability.

Before this cyber-attack, events of September 11, 2001, brought a heightened sense of

awareness and increased focus on the security of critical infrastructure in the U.S. As

the world becomes increasingly connected, because of the Internet and advanced

telecommunications technology, more attention has turned to issues of cyber terrorism.

The increasing dependencies on the ever-expanding telecommunications and

information technology (IT) have brought new concerns over security and susceptibility

to attacks. In the current atmosphere of tense international relations, it is now imperative

that law, policy, regulation, and technology are more fully integrated in the field of

telecommunications. This will help to ensure the stability and security of the nation's

critical, sophisticated, and valuable information infrastructures. Understanding the

geography of this infrastructure is imperative: the nation's economic security is highly

reliant upon this vital resource to support expanding financial networks and information


This research explores the location and connective properties of the Internet and

advanced telecommunications networks, concentrating specifically on physical

infrastructure, and more specifically, on the Internet backbone network. Though the

current role of urban planners and local government is minimal in the building of

telecommunication infrastructure, decision makers in the telecommunications industry,

as well as urban policymakers at the national level, have recognized the importance

and need for policy and regulation.

Today's Internet is comprised of various infrastructures. The impact of

telecommunications infrastructure and the Internet on urban systems, businesses,

academia, government, and consumers has increased dramatically since it's

commercialization. Users have become more reliant on Internet infrastructure and

technology to carry out basic functions and communication activities. Subsequently, they

have become more vulnerable to network disruption. The first step in answering

questions of vulnerability and risk related to telecommunication networks is to

incorporate different types of telecommunication data into one common information


This research project includes various types of telecommunication and Internet

infrastructure data. The research also looks at present policy governing

telecommunication networks and infrastructure, as well as pending policy that will further

impact the industry. To aid in our understanding of this problem, it is imperative to

determine the geographical locations of those links and nodes that are most valuable to

the telecommunication industry- links and nodes that would have the greatest negative

impact should information flows through these assets become disrupted. The results of

the analysis will then be placed in the context of national and regional security.

Identifying the physical location and interdependencies of critical links and nodes will

allow for security emphasis to be redirected to those network components that are

currently most vulnerable to an attack, thereby creating a prioritization scheme for

creating greater redundancy to potentially minimize the impact of node or linkage

disruption. The methodology employed in this dissertation may be widely adopted time

and time again to assess this network.

Statement of the Problem and Hypotheses

This research was intended to explore telecommunication network connectivity and

vulnerability. Currently, one of the most pressing issues of homeland security in the U.S.

is the protection of critical infrastructure. In response to growth concerns over domestic

terrorism, this research focuses on the telecommunication network and geographic

variations in its susceptibility to an attack. Extra insurance in the form of protection and

prevention is required to preserve the most-valuable links and nodes of this network.

And hence, the locational aspects of the most valuable components must be identified.

Any disruption -rai .rgrit lead to the loss of a node or link in a network could lead to

devastating consequences upon the overall network. Determining the most critical links

and nodes will allow for increased protection of the infrastructure.

Preliminary research on this project indicates that highly connected nodes will be

the most critical to the overall network. It is hypothesized that the most important nodes

will house the most bandwidth. It is also hypothesized that the most highly connected

nodes presently house multiple interconnection facilities called colocation facilities (a

more detailed description of these facilities can be found in Chapter 2); the demand for

interconnection increases positively in direct proportion to fiber bandwidth. Although

these nodes may be the most directly connected in the network, they are not necessarily

at the top of the nodal ranking in terms of both direct and indirect connectivity. It is

further hypothesized that the most critical links will be connected to the most critical


This dissertation also will identify critical clusters of Internet activity on the east and

west coast of the U S. and other regional subnetworks or clusters. It is hypothesized that

the most highly connected places contain the largest amount of bandwidth and highest

number of connections to other places, thus housing the most infrastructure. There is,

however, a definite variability in the prominence of links and nodes, as well as a varying

degree of vulnerability. The complexity of a network multiplies as the number of links

and nodes comprising the network increases. As the complexity and size of the network

may vary, so might the vulnerability of each link and node within the overall network.

Links and nodes that are more critical to the overall network will have a greater impact

on the network should they become disconnected from the network. Furthermore, it is

hypothesized that the overall impact of the removal of a node or linkage may not be

obvious without a more in-depth analysis that considers all direct and indirect

connections. Connectivity indices will be used to determine the most critical links and

nodes in the network It is hypothesized that places that are less prominent to the overall

network, but more highly connected in a regional sense may be more vulnerable to an

attack and potentially more disruptive.

And finally, it is hypothesized that there will be great variability in the overall effect

of disruption or termination of flows through various nodes and linkages with regional

implications that are not obvious. When a link or node is damaged or disrupted, the

connective properties of nodes and links may change completely. By applying graph

theory to analyze the network, the structure and properties of the network will be

highlighted and components tested. This will allow for the identification of circuits and

redundancy that would increase the overall connectivity of the network and minimize the

impact of disruption.

The Internet

History of the Internet

The Advanced Research Project Agency network (ARPANET) was created in the

late 1960s as a network project of the Advanced Research Projects Administration

(ARPA) under the direction of the Defense Department. The Internet is a byproduct of

that project, and though appearing as a recent phenomenon, it is actually a

representation of decades of development (Abbate 1999). By the mid-1980s the "Net,"

as ARPANET was nicknamed, had transitioned into the hands of the National Science

Foundation (NSF). The NSF allowed the Net to be used strictly for academic and

research purposes (Boardwatch 1999). However. private firms and corporate developers

of early Internet technology such as: Bolt, Baranek, and Newman (BBN)-a military

contractor and consulting firm involved in the early engineering and development of the

project-realized the economic potential of this research network (Schiller 1999). Firms

developed networks and collaborated with one another by interconnecting these new

networks to create a private-sector version of NSFNET for their corporate clients,

essentially duplicating the NSF's Net. The Internet switched from an academically

oriented network to a commercially oriented one in 1991; when the NSF decided to allow

commercial traffic across NSFNET (Thomas & Wyatt 1999)

Today the Internet is heavily used for e-commerce: shopping, trading stocks,

pornography, music downloading and file sharing and real estate, as well as a valuable

information resource tool. The Internet has become an important forum for news media.

Major broadcasting companies have websites with audio and video clips updated around

the clock, and newspapers have many sections of their printed version posted on

websites. The Internet has also radically enhanced personal communication via: e-mail,

chat rooms,' instant messaging.2 The Internet has evolved from a single experimental

network serving a dozen sites in the United States to a "network of networks" linking

millions of computers and servers worldwide (Abbate 1999).

'A chatroom is a place or page in a Web site or online service where people can
"chat" with each other by typing messages that are displayed almost instantly on the
screens of others who are in the "chat room." Chat rooms are also called "online

'Instant messaging is a service that alerts users when friends or colleagues are on
line and allows them to communicate with each other in real time through private online
chat areas.

Infrastructure and the Net

The network of networks is comprised of links and nodes, a global network

connecting millions of computers worldwide to exchange data and news. There are

several levels of networks at work. Interconnection of these networks is key to the

functionality of the overall network. The Internet backbones are the long-haul routes that

link the nodes of the Internet to users.

The Core of the Global Net Is Centered within the U.s.

Although telecommunications began in America with the introduction and diffusion

Samuel Morse's telegraph, the U.S has not always been the world leader of

telecommunications networks. Great Britain developed superior telecommunication

technologies and communication networks early on in the 1870s (Hugill 1999). Abler

(1991) attributes the development of America's telecommunications industry to system-

specific software and a series of historic accidents, such as the failure of the U.S.

Congress in 1845 to see any value in its ownership of the patent on the telegraph.

During the formative stages of the Internet's development, the United States established

itself as a world leader in implementing the technologies and infrastructure needed to

develop and nurture this innovation. This would explain why, on a global scale, the U.S.

became the center of Internet activity (Cukier 1998, Finnie 1998, Malecki & Gorman


Cukier (1998) gives several reasons why the Internet is "U.S.-centric": (1) It had a

"head-start in building infrastructure and guiding the location of Internet content; (2) the

artificially high cost of cross-border capacity outside the U.S.; and (3) and customer

demand for Internet service" (p. 113). Dodge and Kitchin (2001) describe the distribution

of Internet users, stating that in February of 2000, the U.S. and Canada accounted for

136.06 million users with approximately 5% of the world's population; while Asia and

Europe combined accounted for 126.89 million users despite accounting for well over


60% of the world's population. Internet traffic patterns indicate that over half of European

Internet traffic and 70% of Asian Internet traffic travels through the United States. In

1999, the United States housed 58% of Internet hosts and content, and only 6% of the

100 most visited web sites were located outside of the U.S. (Cukier 1998). Although the

U.S. has a significant lead in telecommunications infrastructure, it is likely European

telecommunication growth will soon boost Europe to rival the U.S.

Major metro areas in the U.S. and the telecommunication companies that provide

Internet infrastructure within them are in a fierce competition for top ranking. Based on

Internet activity and infrastructure, these cities are in a constant shift of rank, depending

on what type of infrastructure is being measured. However, the same seven metro areas

continuously make the list: namely, New York, Washington, D.C., San Francisco,

Chicago, Dallas, Los Angeles, and Atlanta (Cukier 1998, Finnie 1998, Graham 1999,

Malecki & Gorman 2001, Moss & Townsend 2000). New York is the leading Internet hub

in the global economy, housing nine fiber networks, more than any other metropolitan

area (Finnie 1998).

The metropolitan rankings compiled by Atkinson and Gottlieb (2001) do not match

rankings that measure strictly telecommunication infrastructure because their research

includes other economic variables. The Metropolitan New Economy Index (Atkinson &

Gottlieb 2001) has compiled a ranking of U.S. metro areas based on five subcategories:

knowledge jobs, degree of globalization, economic dynamism and competition, the

transformation to a digital economy (including infrastructure), and technological

innovation capacity.

Internet Links-Backbones and ISPs

The term "Internet service provider" (ISP) is an overgeneralization that combines

both small, local Internet service providers and globe-spanning Internet backbones, and

the term does not differentiate among the various types of ISPs. An ISP is a company

that provides access to the Internet, serving individuals as well as large companies, with

direct links. ISPs can play vastly different roles. ISPs can be part of a major backbone

and a global network, or they may be local providers leasing infrastructure and servicing

a limited geographic market.

An Internet "backbone" simply can be defined as a collection of wires that connect

the Internet's nodes, linking them together so that they may exchange data. In more

complex terms, the Internet backbone is defined by the National Telecommunications

and Information Administration (NTIA) as "a set of paths that local area networks (LANs)

connect for long-distance connection. A backbone employs the highest-speed

transmission paths in the network. A backbone can span a large geographic area. The

connection points are known as network nodes or telecommunication data switching

exchanges (DSEs)" (Telegeography 2001, p. 102). International backbones were

defined by Telegeography (2001, p. 102) as "Private data links which cross international

political borders, run the Internet Protocol (IP), are reachable from other parts of the

Internet and carry general Internet traffic: e-mail web pages, and most of the other

popular services which have come to define today's Internet" (Telegeography 2001, p.

102). A backbone firm, such as Sprint, may serve as an ISP itself, or an ISP may lease

access to the Internet from a backbone provider. At the same time, many different ISPs

utilize the same fiber. This means that different communications links, even when

obtained from different providers, many run over the same fiber, in the same bundle, or

in the same conduit (NRC 2001).

Cukier (1998) classifies Internet Service Providers (ISP) into four groups: (1)

backbone ISPs; (2) downstream ISPs; (3) online service providers (e.g., American

Online, Microsoft Network); and (4) firms specializing in Website hosting (e.g., Qwest)

(Table 2-1). Backbone ISPs include those that may connect the Internet globally and

transfer the largest amounts of data, such as Exodus, Globix, Sprint, MCI WorldCom,

and IBM.

Table 2-1. Internet service provider groups
ISP Group Example
Level 1: National backbone ISPs SprintLink, MCI
Level 2: Downstream ISP AOL TimeWarner, WorldCom
Level 3: Online Service Provider America Online, Microsoft Network (MSN)
Level 4: Website Hosting Qwest

According to Malecki and Gorman (2001), 48 national backbone operators

currently operate the transit network in the U.S. The next level is comprised of the

downstream ISPs, hundreds of local and regional ISPs that serve mainly individuals and

small and medium businesses. Malecki & Gorman (2001, in Brunn & Leinbach 1991, p

91) divided the Internet hierarchy into five levels (Table 2-2). All networks exchange data

on the first level. The second level makes the transfer of data possible among cities

around the world. Regional networks comprise the third level, though Malecki and

Gorman warn, "they [regional network providers] may be a dying breed as they are

replaced by national providers" (Malecki & Gorman 2001, p. 92). Internet Service

Providers (ISPs) are the fourth level. Finally, Internet users are the fifth level.

Table 2-2. The hierarchy of Internet network interconnections
Level Providers Example

Level 1: Interconnection Network Access Points Ameritech Chicago
(NAPs), private peering points NAP

Level 2: National National backbone operators SprintLink, MCI
Level 3: Regional Regional Network Operators Erols, Rocky Mountain
networks Internet, Inc. (RMII)
Level 4: ISPs Internet Service Providers DialNet, bright.net

Level 5: Users Business and consumer

"Middle-Mile" Network Links

These are the links of the "aggregators," firms that connect large data customers,

such as firms in office buildings and office parks, to local points of presence (POPs) of

backbone networks. Many utilities such as Gainesville Regional Utilities (GRU) and

Florida Power and Light (FPL) serve as middle-mile providers in this service area.

Middle-mile facilities connect the backbone fiber owned by large national

telecommunications firms, such as Sprint or MCI WorldCom, to regional networks that

may be owned by utility companies, or smaller telecommunication firms. Middle-mile

facilities are an integral part of the Internet's backbone hierarchy, providing linkage

between national/regional networks and local networks. A recent report by the Federal

Communications Commission (FCC) (2000) includes middle-mile facilities as one of

three main types of the Internet's network components (backbone facilities and last-mile3

facilities comprise the other two). Thus far, little or no research has been done to

analyze this scale or segment of the Internet.

The Last-Mile Connection: From Backbone to Computer

The "last-mile" connection refers to the connection between the Internet backbone

and user; it is the final physical linkage between user and network. A user's connection

to the Internet may be one of three types: dial-up, continuous, or wireless. Continuous

connection is the most efficient connection, enabling instant delivery of e-mail,

elmirnraing the need to tie up a phone line, and allowing businesses to advertise and

publish directly to the Internet and enable "real-time energy management" (Hurley &

Keller 1999, p. 3). Most households currently access the Internet through dial-up

connections, using a modem and a telephone line. This method is the cheapest way to

access the Internet. Users may not have the option to use a faster, more expensive type

3"Last mile" is the term used to describe the connection between the user and the
ISP, which is typically the slowest aspect of Internet access.

of connection because access to sophisticated types of Internet infrastructure are not

offered in their neighborhood Integrated Services Digital Network (ISDN), developed in

the early 1980s to improve telephone service, is a technology introduced in the early

1990s that provides moderate bandwidth (64 to 128 Kbps4). This technology was

efficient for connecting to the Internet, though a slow and lengthy process. Soon after,

more technologies emerged: cable modems offering high-bandwidth and continuous

connection (10 Mbps' or 30 Mbps), Asymmetric Digital Subscriber Line (ADSL), a

-,,gr,.,n,,j.,1Th copper-wire technology (8.192 Mbps/640 Kbps up). These new

technologies boasted higher bandwidth, which allowed for a higher data transfer rate

and volume. The connections were dedicated to data transfer, allowing the user to

maintain a constant, uninterrupted transfer.

Physical distance matters with many types of sophisticated telecommunications

infrastructure, such as digital subscriber line (DSL). The closer one is to the service

provider, the better the service (Moss 1998). With a full range of options for providing

high-bandwidth local access, it is clear that no single technology will be declared the "all-

around winner" (Hurley & Keller 1999, p. 37). -.gr,-t.r,..,dirn technologies remain

competitive, with user preference between cable modems and DSL not apparent.

Wireless is the newest form of access to the Internet. The development and deployment

of wireless service to provide mobile access hinges on the results of current FCC efforts

to open up radio-frequency spectrum for such services (NRC 2001). Currently, a

wireless connection to the Internet is not considered secure and may be decrypted by

hackers quite easily. This lack of security greatly deters wireless users from making

financial transactions or transferring valuable data via a wireless connection. Whichever

4 Kbps- kilobits per second (thousands of bits per second). Kbps is a measure of
bandwidth (the amount of data that can flow in a given time) on a data transmission medium.

5 Mbps-megabits per second (millions of bits per second) Mbps is a measure of
bandwidth on a transmission medium.

technology is used to connect to the Internet, the user is able to navigate the Internet,

check e-mail, and communicate with other users. As data are requested, users may

navigate from one network to the next to reach the data source, call up the data packets,

and navigate back through the Internet to complete the request.

Internet Nodes

The original nodes of the Internet were universities and research institutions invited

by the Department of Defense to participate in the ARPANET project; they included

prestigious research institutions such as MIT, Carnegie Mellon, and UCLA, to name a

few. These universities were given large grants in 1965 to create "centers of excellence"

computing research centers. These research centers, da5E:,re.l Trrougr,.:.ul nme United

States, were connected to form ARPANET. It was the original users that transformed

ARPANET into the Internet we know today. The users of the network could create new

applications with few restrictions and had the incentive and ability to experiment with the

Internet to mold it to better meet their immediate needs, for example, building new

hardware or software, or using the existing infrastructure in new, improved ways (Abbate


Large businesses, universities, and larger institutions often have direct links that

allow the user to bypass the telephone network and connect directly to a metropolitan-

based network or to Internet backbones (Malecki & Gorman 2001). Langdale (1989)

explains that large global companies that lease sophisticated networks that utilize high-

speed circuits dominate international business telecommunications traffic, allowing the

firms to link their networks to networks housed in major industrialized countries. The cost

of communication networks is largely determined by the maximal capacities of networks,

but the traffic those networks carry depends on how heavily those networks are used.

Thus, increasing the efficiency of data transport would make the Internet less expensive

and more useful (Odlyzko 2000).

The Internet today is an amazing network, but the individuals, organizations,

government, businesses, and educational institutions incorporated within and linked by

the Internet make it the invaluable resource it has become. The network is a medium for

the exchange of information, data, and ideas among those who use it. The Internet is the

most influential advancement in the distribution and exchange of information since the

telephone (Moss & Townsend 2000).

Network Interconnection

In February 1994, the National Science Foundation (NSF) designated four nodes

as Network Access Points (NAPs): San Francisco, operated by PacBell; Chicago,

operated by Bellcore and Ameritech; New York, operated by SprintLink (this NAP is

actually in Pennsauken, New Jersey); and Washington, D.C., operated by Metropolitan

Fiber Systems. NAPs are "sites where private commercial backbone operators could

interconnect" (Boardwatch 1999, p. 13). When NSF completely transferred responsibility

and rights to the Internet to commercial entities in 1995, the networks interconnected

only at the NAPs. When the NSFNET backbone was shut down and transferred to the

commercial entities, the NAP architecture became the Internet (Boardwatch 1999). Each

of these four NAPs would be maintained and operated by telecommunication companies

rather than the NSF.

As the Internet continued to grow and develop, traffic increased dramatically, and

the NAPs became increasingly congested and utilized; demand for more interconnection

points began to increase. The core of U.S. Internet interconnection remains the four

"official" NAPs and includes other major connection points, the Metropolitan Area

Exchanges (MAEs), Boardwatch 1999, p. 13). Thirty-eight of 41 major backbone

networks in the U.S. connect at both MAE East and MAE West (Malecki 2000). From

this, we can conclude that the importance of the NAPs has not declined. The MAEs and

NAPs were and are considered public facilities where backbones and ISPs could

interconnect and colocate at little or no cost. However, because a large number of

networks link at each NAP, congestion and inefficiency are common.

The solution to the demand for more efficient, faster connections was private

interconnection; a term originally coined "private peering." Network peers connected at

private locations rather than at the NAPs. The term private peering is sometimes

misused, describing network interconnections in general, whether the networks are

equals or gross unequals (for example, national Internet backbone provider networks

would not be equal to a local Internet service provider). Private interconnection is a

relationship between two or more ISPs in which the ISPs create a direct link between

each other and agree to forward each other's packets directly across this link instead of

using the standard NAPs, or Internet backbone Simply put, peering takes place

between network equals, and interconnection takes place between unequal networks,

with the weaker party paying for transit. Interconnection can involve more than two ISPs.

In this situation, all traffic destined for any of the ISPs is first routed to a central

exchange, which is called a peering point, and forwarded on to the final destination after

hitting the peering point. Private peering points function similar to the larger

interconnection points because they provide interconnection between networks.

Colocation facilities operate on a smaller scale than the NAPs, IX facilities, and MAEs

with contracts and higher fees for the users (Boardwatch 2000) The commercial Internet

operates as a machine of collaboration and cooperation between networks. "Every ISP

network must inter-operate with neighboring Internet networks in order to produce a

delivered outcome of comprehensive connectivity and end-to-end service" (Huston

1999, p. 1).

This chapter has discussed the relevance of this dissertation in relation to current

concerns of the protection of critical infrastructure, particularly telecommunication

networks. A large disruption to the Internet will cause large repercussions as it is directly


tied to the economy This dissertation determines the most important links and nodes to

the Internet backbone network by employing network analysis. It is hypothesized that the

links and nodes most critical to the network will have a high concentration of fiber

bandwidth and an equally high concentration of colocation and interconnection facilities.

Chapter 2 presents a review of literature relating to networks, geography and

telecommunications. Chapter 3 explains the data, analysis, and methodology used in

this research. Chapter 4 presents and discusses an unweighted analysis of the U S.

Internet backbone network. Chapter 5 presents and discusses a weighted analysis of

the U.S. Internet backbone network Chapter 6 presents a statistical analysis of network

measures, using indices derived from Chapters 4 and 5. Chapter 6 also summarizes the

findings and discusses future research possibilities of the U.S. Internet backbone



In order to fully understand the research implications and to identify related

research to further aid in answering the proposed questions, this chapter examines

relevant literature and reviews basic concepts in network analysis. The initial portion of

this chapter will focus on network concepts and constructs, with a brief overview of

networks. The second section reviews geographic literature of networks and information,

concentrating on network analysis, telecommunication infrastructure analysis and the

Internet backbone. The third section has two subsections: historical telecommunications

and the Internet, highlighting the importance of telecommunications to the city.

Network Concepts and Constructs

A short glossary of network terminology and concepts is provided below to

facilitate the discussions and analysis. These definitions will serve as a point of

reference and to help clarify various network concepts. A network can be defined most

simply as two or more nodes connected by link(s), an interconnected system of objects

or people, and consist of a set of links and nodes. Nodes and links are the main

composition of a network: nodes/vertices are connecting points or objects in a graph or

network while links/edges are the connections between them.

Graphs are abstract representations or models of a network. The terms vertices

and edges are most commonly used when describing a graph, a vertex refers to a node

and an edge refers to a link. In this text, vertices and nodes, and links and edges will be

used interchangeably in this text. Relative graph theory applies abstract configurations

that consist of points and lines to study network properties. A directed graph has ordered

pairs of edges connected by links with direction while an undirected graph has

unordered pairs of edges connected by links without direction.

A network component refers to which set of nodes a vertex belongs that can be

reached from it by paths running along edges of the graph. A geodesic path is the

shortest path through the network from one node to another, while the diameter

indicates the longest geodesic path between any two nodes (Kochen 1989). A circuit

indicates the flow of a path. Circuits denote direction of a path or route within a network,

that lead back to themselves.

A network connection or the connectivity of a node describes the relationship

between nodes or links in a network, a topological description that specifies the

interconnections between nodes. Connections may be direct or indirect. Accessibility is

a description that describes the degree to which a node can be reached or accessed by

other nodes in the network, given their absolute or relative locations. Location-based

accessibility is the degree to which a node can be reached or accessed given its location

in relation to the other network nodes, as based on physical distance and spatial


The nodality of a node refers to the degree of a node's dominance within a

network. To measure the location of a network's components is to determine the

centrality of a node within the network (Kochen 1989, Perrucci & Potter 1989). A

gateway is a nodal characteristic that describes a main entry-and-exit point for a region

or network. A disconnected node or segment has been detached or removed from the

network or subnetwork.

Structural equivalence is a measure of the similarity in roles of nodes in a

network, through the determination of which nodes play similar roles in the network. For

example, in Figure 2-1, nodes V1 and V3 are structurally equivalent, playing similar roles

within their regional cluster of nodes as collector nodes, or regional hubs. Structural

holes are areas of no connection between nodes that could be used for advantage or


/I X\ \

Figure 2-1. Network A

Network centralization is equivalent to a node's location or position in a network

considering all direct and indirect links, or from a multi-faceted standpoint, how a node is

characterized in terms of connectivity, accessibility, it's propensity to cause disconnects

upon it's removal or failure, and its importance in terms of adding redundancy or circuits

as a back-up node should other nodes (nearby) become removed or fail.

"Betweenness" is a measure of influence over what flows in the network, a

measure of power that a node has based on its relative location or position in the

network. For example, in Figure 2-2, node V3 is an important node in that it helps

connect two distinct regional clusters of lesser-connected nodes. Note that the removal

of node V3 would cause the network to become disconnected (Krebs 2004).

"Closeness" is a measure of how far any node is to any other node in the network.

For example, Nodes V3, V7, and V8, in Figure 2-2, are only two links away from any

other node in the network (max). Tier 1 in the nodal hierarchy in Figure 2-2 would be

comprised of nodes V3, V7, and V8. The second tier would consist of the remaining

nodes, V1, V2, V4, V5, V6, and V9. Note: Closeness is different from diameter in that

diameter (d=3 for this network) representing the minimum number of links between the

two most distant points in a network (Krebs 2004).

\ p/


Figure 2-2. Network B

"Boundary Spanners" are nodes that are more central than their immediate

neighbors whose connections are only local. In short, they are regional hubs or the

center or predominant node in a regional cluster or subnetwork (Krebs 2004).

"Peripheral Players" are nodes that are often connected to outside or external

networks that are not currently mapped, making them of grater importance than nodes

that are not directly linked to those external networks. Hence, their importance may be

understated (Krebs 2004, Kochen 1989).

Network Categorization

Networks may vary in nature, size, and purpose. They may range from the

environment, to human, to technical, from tangible to intangible and can be physical or

virtual. Networks might be human, technical, or natural, and private or public. Several

disciplines use study and utilize networks for various purposes; transportation,

landscape ecology, geography, neurology, telecommunications, communications,

physics, computer science, economics, health care and medicine, electric and gas,

water distribution and resources, urban planning, mining and geosciences. Networks

can be organized into various categories. The following section discusses physical and

virtual networks, technical networks. environmental and natural networks, social and

human networks, and private and public networks.

Physical and virtual networks

Physical networks are two or more nodes that are connected by a physical link;

roads, wires, corridors, streams, cables, pipes. The U.S. Interstate Highway System is

an example of a physical network.

Virtual networks exist conceptually, rather than being physically real. Though a

virtual network may not have a physical composition, it still serves the same purpose as

physical networks; to connect various objects for the sharing of data and knowledge.

Networks can be both physical and virtual in nature, the prime example being the

Internet (www.webopedia.com). Some social networks might also be considered both

physical and virtual; meeting physically or being virtually connected by association.

Technical networks

Technical networks are used in science or industry and are of a mechanical nature.

They typically include sophisticated equipment that may be electronic or computerized.

An electric company providing power to a neighborhood is an example of a technical

network. The power-grid itself is public, using the power plant, high-voltage

transmission lines, power substations, transformers, power poles, and transformer

drums to move power across a network to consumers. All of the equipment is electrical,

mechanical, or computerized in nature.

Optical networks are high-capacity telecommunications networks based on optical

technologies and components that provide routing, grooming, and restoration at the

wavelength level as well as wavelength-based services. Optical networks are providing

more advanced capabilities as well as lower costs for the telecommunications industry.

'In the early 1980s, a revolution in telecommunications networks began that was

spawned by the use of a relatively unassuming technology, fiber-optic cable. Since then,

the tremendous cost savings and increased network quality has led to many advances in

the technologies required for optical networks, the benefits of which are only beginning

to be realized" (IEC 2002).

Natural and environmental networks

Natural networks include environmental networks. As environment encompasses

disease and epidemiology, so percolation of disease on a network falls in this category.

The transfer of biological agents occurs through natural networks. With the sudden

onset of Severe Acute Respiratory Syndrome (SARS) in early 2003, we have been

reminded of the dangerous ability of disease to travel rapidly through a network of

individuals or entities. The Center for Disease Control (CDC) advised against

unnecessary travel to those areas infected with SARS (CDC 2003), in an attempt to

prevent the transmission of the disease through transportation and social networks.

Other natural networks include hydrological networks. Hydrological networks describe

the movement and interaction of water, including lakes, streams, rivers with each other

and other environmental impacts.

Neural networks are comprised of processing elements called units that respond in

parallel to a set of input signals given to each. Neural networks are most closely

associated with the study and research of the human brain as the unit closely represents

the brain's neuron. In the medical field, neural networks are called referred to as artificial

neural networks (ANNs) and have been in existence since the 50s and are used in a

wide variety of applications, including speech recognition, which was the original intent

of creating an ANN. "Artificial neural networks (ANN) are a very simple model of the

brain" (O'Sullivan & Unwin 2003, p. 364). ANNs are now used in geographical

applications. ANNs have geographic applications. They are based on the idea that

"brainlike structure = intelligence." This type of network assumes that it can operate in

two types of environments: supervised or unsupervised. A supervised network is a

network that has been trained or programmed by a known set of data. An unsupervised

network operates in a more traditional form, in that it eventually "settles to a state such

that different combinations of input data produce different output combinations that are

similar to a clustering analysis solution" (O'Sullivan & Unwin 2003, p. 365).

Social and human networks

Social networks are formed as a means to communicate with and within a group.

The nodes in social networks represent people or groups, while the links represent

relationships or flows between them. Examples of social networks might include a

school, political organization, a firm, unionized workers, and friends. Other types of

human networks may not be as obvious as others. For example, transportation networks

could be considered human. Transportation networks are built by for the purpose of

transferring goods and information. However, in order to create one network, the

disruption of another may occur. Road and railroad networks may cause landscape

fragmentation that might affect other types of networks. Recent research has focused on

the environmental impact of road networks, and further the impact upon animal behavior

and population (Carleton 2003). A review of relevant social network analysis follows later

in this section, within the review of complex network topology literature.

Private and public networks

Public networks can be used free of cost while a private network charges some

types of user fee. Government agencies, business and individuals can, and do, set up

their own private networks. Private networks allow for security and privacy by controlling

user access to the network. Private networks generally require a user fee. There are a

number of public networks, both large and small that are commonly available to

government, business and consumers in the USA. Some of the most common public

networks are: The Public Switched Telephone Network (PSTN, wired telephones) Public

Wireless Voice Networks (PWVN, such as cellular and Personal Communications

Service [PCS]) Public Wireless Radio Paging Networks (PWRPN), and the Internet.


Private networks, such as government and corporate, may or may not connect to public

networks, such as the Internet. However, these public and private networks increasingly

overlap. Private and public networks may be physical, virtual, or both. A private virtual

telecommunication network allows for an individual or entity to remotely access a larger

network using public network infrastructure, while maintaining privacy and security by

encrypting data that is exchanged.

The "Small World" Phenomenon

Many have studied the small world phenomenon including, but certainly not limited

to Milgram (1967), Kochen (1989), Wasserman and Faust (1994), Ozana (2001), Albert

and Barabasi (2002a), and Watts (2003) The small world concept is that, though a

network may seem vast in size, the nodes in the network are connected with short paths.

Milgram (1967) pioneered the concept and the most popular theme "six degrees of

separation." Milgram concluded that there was an average of six paths between most

pairs of people in the U.S.

Network Analysis

Complex Network Topology Literature

This group of network literature discusses the exploration of the topological

properties of real networks. Complex network analysis is used in a variety of disciplines

to understand various networks and real systems (Albert & Barabasi 2002b, p. 49).

Albert and Barabasi (2002b) give a very detailed review of complex network research in

their paper titled Statistical Mechanics of Complex Networks, published in the Reviews

of Modern Physics. Newman (2003) also gives an excellent review of complex network

analysis in his publication the Structure and Function of Complex Networks. These two

papers were used as a guide for the following review of examples complex network


Social networks

The social network serves as a major source of social capital. Being part of a

social network means to communicate with and within a group. Examples of social

networks might include a school, political organization, a firm, unionized workers, and

friends. Two techniques for approaching social networks are described by Watts (2003,

p. 48): network structure and social structure. Sociologist Mark Granovetter (Granovetter

as cited by Watts 2003, p. 49) concluded that effective social coordination does not

emerge from strong ties to a social network, but rather from occasional weak ties.

Granovetter described in his 1973 paper, The Strength of Weak Ties, a foresight of what

is now described by Watts (2003, pp. 49-50) as the new science of networks. Watts

concludes that "social network analysis still has one major glitch; there is no dynamics"

(Watts 2003, p. 50). Thus the measure and study of social networks is complex and

problems are approached differently, depending on the nature of the network and it's


Actor collaboration network

One of the most analyzed social networks is the movie actor collaboration network.

This network contains all movies and the casts of these movies since the 1890s. The

network is continuously expanding and updated and is based on the Internet Movie

database. Watts and Strogatz (1998); Newman, Strogatz and Watts (2001); Barabasi

and Albert (1999); and Albert and Barabasi (2000) have all used the movie actor

database. Watts and Strogatz (1998) reported that in 1998 the network had 225, 226

nodes (actors). By May of 2000, the number of nodes (actors) had grown to 449, 913

according to Newman, Strogatz, and Watts (2001). When two actors work together in a

film, they have a common link.

Science collaboration network

The science collaboration network is similar to the movie-actor network. When two

scientists work together, they are connected nodes. Newman (2001a, 2001b, 2001c)

studied four databases during a five-year time frame that included physics, biomedical

research, high-energy physic, and computer science to determine the topology of this

network. Each of the networks shows a small average path with high clustering

coefficients (Albert & Barabasi 2002b). The collaboration network of mathematicians and

neuroscientists that published between 1991 and 1998 was studied by Barabasi et al.

(2001); they were found to have consistent degree distributions with other collaboration


Sexual contact network

Liljeros et al. (2001) have investigated sexually transmitted diseases (STDs),

including AIDS. They studied a network based on the sexual relationships of 2810

individuals. The data was obtained from a Swedish survey conducted in 1996. The

distribution of sexual partners was studied for a year. The spread of STDs through the

network was studied (Albert & Barabasi 2002b). Due to the fact that the average edge in

the network has a relatively short-span, they analyzed the distribution of partners over a

one-year period.

Cellular networks

The metabolism of 43 organisms was studied by Jeong et al. (2000). In this

project, networks in which the nodes were substrates and the links were chemical

reactions represented the organisms. The average path was found to be rougQri, Mre

same in each of the organisms. Wagner and Fell (2000) looked at the clustering

coefficient while focusing upon the energy and biosynthesis metabolism of the E-coli

bacterium. Their results show an undirected version of this substrate graph (network)

has a small average path length with a large clustering coefficient (Albert & Barabasi

2002b). Protein-protein interactions within a cell were also considered in the analysis, as

they help to characterize the cell network. The proteins represent nodes that are

connected if they bind together.

Citation networks

The nodes in a citation network represent published scientific articles and links

represent a reference to that particular scientific article. This network was studied by

Redner (1998). The network included 783, 339 papers cataloged by the Institute for

Scientific Information and 24, 296 papers that were published in Physical Review D

between 1975 and 1994. The network is formed by citation patterns used within the

publications, nodes represent published articles and links represent a reference to a

previously published article. Following Redner, Vazquez (2001) did a similar study using

the citation network. Vazquez extended the study to include outgoing degree distribution

and found an exponential tail.

Linguistic networks

Ferrer i Cancho and Sole (2001), Yook, Jeong, and Barabasi (2001b), and

Steyvers and Tenenbaum (2001) are amongst those researchers that study the complex

networks formed by human language. Steyvers and Tenenbaum's results indicate that

languages form networks and dynamics not so different from other networks (Albert &

Barabasi 2002b, p. 53). Ferrer i Cancho and Sole (2001) created a network using the

English language, based on the British National Corpus. The nodes represented nodes

and were lined to each other if they appeared next to each other, or were one word apart

from each other in sentence (Albert & Barabasi 2002, p. 53). The network consisted of

440, 902 words. Ferrer i Cancho and Sole (2001) found that the average path length

was small, there was a high clustering coefficient, and there was a two-regime power-

law degree distribution (Albert & Barabasi 2002b, p. 53). Yook, Jeong, and Barabasi

(2001b) used a different network for their study of the linguistic network. For their

network, two words were linked if they were synonyms according to the Merriam-

Webster Dictionary. Their results show a large cluster of 22, 311 words out of a total of


Ecological networks

Ecologists study food webs or food chains to determine the network relationships

between various species. In a food network, nodes represent the species and the links

would be the predator-prey relationships between them (Albert & Barabasi 2002b).

Williams et al. (2000) recently studied the topology of some of the largest food webs;

Skipwith Pond, Little Rock Lake, Bridge Brook Lake, Chesapeake Bay, Ythan Estuary,

Coachella Valley, and St. Martin Island. Though the webs were comprised of very

different species in different habitats, each indicated that species in habitats are three or

fewer links from each other (Williams et al. 2000). The research of Montoya and Sole

(2000), and Camacho et al. (2002a) confirmed that the food webs show highly clustered

nodes. Montoya and Sole's research focused on Ythan Estuary, Silwood Park, and Little

Rock Lake. Two of their research areas overlapped with that of Williams et al. (2000).

Camacho et al. (2002a, 2002 b) found that an exponential fit worked well, following the

well-documented existence of key species in the food web. They represent a common

feature of scale-free networks, hubs. Forman and Spearling (2002) explore road

ecology. Forman and Spearling discuss the vast network of roads that billions utilize

daily. They point out that until now, there has been little or network theory applied to

road networks and ecology. The road network and landscape indeed form a complex

network. Forman and Spearling did a study of the 4 million miles of public roads in the

U.S. and determined how much area they ecologically affect. They concluded that about

one fifth of the total U.S. area, 20%, is directly affected ecologically by our road system.

Telephone call networks

The long-distance telephone call network has been studied by Abello, Pardalos,

and Resende (1999) and Aiello, Chung, and Lu (2000), amongst others. They

constructed a large, directed graph using long-distance telephone call patterns. Phone

numbers represent nodes, while every complete call represents a link. These

researchers used the calling network based on the data from one day. They concluded

that the degree distributions of the outgoing and incoming edges followed a power law

with exponent 2.1.

Power and neural networks

The U.S power grid consists of generators, transformers and substations, the

network nodes. The links are the high-voltage transmission lines. With the power outage

effecting the northeast U.S in the summer of 2003, we saw the interconnectedness of

this network. The degree distribution of the power grid is consistent with an exponential

(Albert & Barabasi 2002b, p. 54). Watts and Stogatz studied the nematode worm, where,

the nodes are neurons and a link exists between either a synapse or a gap junction

(Albert & Barabasi 2002b, p. 54) In their research, Watts and Strogatz (1998) found that

for both networks (power and neural) the average path length was approximately equal

to that of a random graph of the same size and average degree, and the clustering

coefficient was much higher (Albert & Barabasi 2002b, p. 54).

World Wide Web & the Internet

One of the most recent complex networks to be examined is the Internet. As was

introduced earlier, geographers analyze networks, and the geography of networks is

often relevant to other disciplines. While geographers were working on early network

analysis of transportation networks using graph theory (Kansky 1963, Garrison 1960,

Haggett & Chorley 1969), Erd6s and Renyi (1960) were focused on theoretical work of

complex networks. They modeled large networks using algorithms where N nodes were

randomly connected according to probability P. They found that the nodes were

connected in a manner that followed a Poisson distribution' (Albert & Barabasi 2002B,

p. 49). The network model created by Erdos and Renyi was used widely in several

'The Poison probability distribution is used to analyze how frequently an outcome
occurs during a certain time period or across a particular area. Other geographic
applications of Poisson involve the analysis of existing frequency count data to
determine if a random distribution exists (McGrew & Monroe 2000).

disciplines analyzing networks. The most closely related research of this group would be

Internet topology generators (Radoslavov et al. 2000).

According to Barabasi et al. (2001), the absence of topological data in the analysis

of complex networks makes random network models the most often applied method of

network simulation. As computer technology advanced, and data for real world networks

became more available, several empirical finings emerged. Three network

characteristics resulted most often from complex network analysis: short average path

length, high level of clustering, and power law and exponential degree distributions

(Albert & Barabasi 2002b, pp. 48-49). A short average path indicates a short distance

between nodes in a network, while topologically close nodes that are well connected

form clusters. In 1998, Watts and Strogatz formalized this cluster concept for complex

networks using several large data sets. The real world networks they analyzed were not

completely random but instead displayed clustering at the local level. Local clusters

linking to other local clusters formed "Small worlds." This analysis was followed by

studies performed by Albert and Barabasi (2002b) and Adamic and Huberman (1999),

which concluded that when the WWW is studied as a graph it follows power law

distribution2 rather than Poisson or exponential distribution.

Albert and Barabasi (2002b) have described research of the Internet in two realms;

the World Wide Web and the Internet. Albert and Barabasi (2002b) label the documents

(web pages) of the Internet as the nodes and hyperlinks (URLs) as links. Lawrence and

Giles (1998,1999) have estimated the size of this network as having close to one billion

nodes based on 1999 data. Network research of the WWW has increased as the

network experienced rapid growth, and after it was realized that the distribution of the

web pages "followed a power law over several orders of magnitude" (Albert & Barabasi

2A power-law implies that small occurrences are extremely common, whereas large
instances are extremely rare. A function, f(x), is a power law if the dependent variable, x,
has an exponent (i.e. x is raised to some power).

2002b, p. 49). Albert, Jeong, and Barabasi, (1999). Lawrence and Giles (1998, 1999),

Adamic and Huberman (2000), and Adamic (1999) are a few researchers among the

many that have studied the complex network topology of the WWW and Internet.

According to Albert and Barabisi, the topology of the Internet is studied at two levels: the

router level and the interdomain level (Albert & Barabisi 2002b, p. 49). All nodes are

routers and all links have physical connections between them at the router level. The

interdomain level consists of hundreds of routers and computers, each represented by a

single node (Albert & Barabisi 2002b, p. 52). The interdomain level and the router level

have both been studied by Faloutsos et al. (1999), who concluded that in each case, the

degree distribution follows power law. The connectivity of the routers was mapped by

Govindan & Tangmunarunkit (2000). Yook et al. (as cited in Albert & Barabisi 2002, p.

52) and Pastor-Satorras et al. (as cited in Albert & Barabisi 2002b, p. 52) have confirmed

in their studies of the Internet that the network does display clustering and small path


The majority of research on complex networks revolves around abstract or

theoretical networks and geography is not relevant. But networks do impact geography,

and vice versa. At the same time, the Internet is dramatically affecting the city, making

research of the Internet's geography relevant and important. The following research will

contribute directly to geographic literature and research of the Internet backbone

network, the effect of the Internet upon cities, and the study of complex networks.

Modeling Networks with Geographic Information Systems

A geographic information system (GIS) can be used to model a network. There are

various GIS structures that can be used as tools in modeling linear features, including

coverages, geodatabases, geometric networks, logical networks, optical networks. GIS

based modeling programs tie linear features to spatial coordinates, unlike other

modeling platforms.

Network modeling is dominated by vector GIS, though it is possible to model most

networks using raster-based GIS (Zeiler 1999, Malczweski 1999). Bernhardsen (2002)

discusses raster connectivity operations, which is a process that requires discrete cell-

by-cell displacements, that originate from a single starting point. The cells must contain

values that are significant in how one can move on the surface. This means the raster

cells represent a friction surface. It is easier to model path attributes such as direction

and flow in a vector GIS. The grid cells used in raster only approximate the exact shape

of a line in a network, direction is not explicitly given, and line and node attributes must

be stored as a separate layer (Bernhardsen 2002).

GIS based systems enable the user to take advantage of dynamic segmentation.

This is an extremely important feature in building a network model. Dynamic

segmentation is a two-step process performed on a spatial data set comprised of linear

features. First, a route system is created by associating adjacent line segments into one

or more groups that have a definite linear sequence. Second, descriptive information is

associated with the route system by referencing distances from the starting point of each

route. Dynamic segmentation allows tiny areas along a line feature to be referenced

without actually breaking that line into smaller pieces. This means that linear distances

can then be calculated directly from the routes and associated attributes (Northwest GIS

Services 2002). Dynamic segmentation uses a linear referencing system (linked to

geographic coordinates) to define a common datum for referencing the linear lines

(Zeiler 2002).

O'Sullivan and Unwin (2003) explain that while software packages such as ESRI's

Network Analyst are showing great promise in the realm of network analysis, a complete

comprehensive tool kit that will address the complexity of line objects and the advanced

mathematical concepts needed for analysis is still years away. This is because statistical

approaches to lines, as well as graphs, have had only limited success. In agreement,

Malczewski (1999) notes that some researches contend that there are operational

limitations on the use of optimization models for spatial decision analysis in a GIS

environment. But, Malczweski maintains that although GIS presently optimizes in data

gathering and visualization of the results, it can be fully integrated to provide a powerful

tool for spatial decision support in multi criteria decision making.

GeoDatabases and Geometric Networks:

Geometric networks are networks that model linear systems such as utility

networks and transportation networks (MacDonald & ESRI 2002). They support a rich

set of network-tracing and solving functions. Geometric networks consist of edge

network features and junction network features (Zeiler 1999). Edge elements are

connected to other edge elements via junctions. There are two types of network

features; simple and complex. Simple network features correspond to a single network

element, while complex features correspond to more than one network element.

Principal benefits to the geometric network model (Zeiler 2002, p. 128) include the


S Editing networks is simple. When a user adds network features, one can ensure
that they are properly connected to the rest of the network with network
connectivity rules.

S Network features can represent complex parts of a network, such as switches. This
simplifies the editing process and allows one to create maps of a higher quality
with less features in one's network representation.

* A suite of simple and advanced network analysis solvers is built into Arclnfo, ready
to use. Network analysis is fast even on very large datasets.

S Networks can be versioned. Multiple users can simultaneously edit the same large
network in compliance with their organization's work-flow practices.

Geodatabases, part of ESRI's ArcGIS software is a unique data format that is

similar to the coverage data model. It is a storage mechanism for spatial and attribute

data that contains specific storage structures for features, collective features, attributes,

relationships between attributes and relationships between features.

There are two main concepts to understanding a geodatabase:

S A geodatabase is physical store of geographic information inside a database
management system.

S A geodatabase has a data model that supports objects with attributes and
behavior. Behavior describes how a feature can be edited and displayed. (ESRI

A geodatabase has the capability to allow multiple users working from it

simultaneously: Geometric networks are created using geodatabases. The data and

network functions and flows and relationships are used to build a geometric network

model through the geodatabase. Given the capabilities and sophistication of the

geodatabase and geometric network models, it would be ideal to build a

telecommunication infrastructure data model using these tools given flow or line data are

available. The geometric network model allows several data types to be incorporated

into the model, which is what the current project calls for. Several types of

telecommunications data with different characteristics and capabilities are being studied.

The model would allow not only for organization and a model of the data, but simulation

exercises and practices that would not be possible in non-GIS supported network

modeling environments.

Geographic Literature of Networks & Information

Geographic Network Analysis

The Internet is a data/information transport network with the ability to connect

places that are geographically separated, moving data from node to node, user to user,

service to service, workstation to workstation. Though geographers have a long history

of applied network analysis (e.g., Lalanne 1863), relatively little has been done on the

geography of the Internet. Geographic research has mainly focused on transportation

networks. In 1961 Garrison and Marble published their research findings on the U.S.

transportation system in The Structure of Transportation Networks. They concluded that


transportation structure is dependent upon the characteristics of the location housing the

network. Garrison and Marble (1961) also incorporated the work of fellow geographer

Brian Berry (1960) into their analysis by utilizing his measurements of technological and

demographic factors. Berry had done substantial research on networks and economic

variables, incorporating technological and demographic variables into his research. He

synthesized statistical measurements of levels of development to reveal the basic

factors underlying variations in the measurements of development. Partnering with

Berry, the researchers were able to incorporate national development into their analysis

using regression methods. They found that technological development was the major

determinant of network structure and that physical characteristics of a location are less

significant in explaining network structure than level of development of a location (Taaffe

& Gauthier 1973, p. 112). Kansky continued the research of Garrison and Marble (1961)

and Berry (1960) in his 1963 paper titled "The Structure of Transportation Networks."

Adams (1971) followed the U.S. highway network research with an analysis of the

domestic airline network. Adams used matrix methods to study airline growth and

connectivity. Nyusten and Dacey (1968) expanded the research agenda to include

telephone networks & other types of fixed infrastructure. Taaffe and Gauthier (1973)

followed with a text that demonstrated and explained how geographers study

transportation systems. In 1977 Haggett, Cliff and Frey published two volumes

explaining locational analysis & methods that are still applied in geography today. The

major contribution of this research was the development of spatial models for network

structure relating to location, density, and change over time. Haggett, Cliff and Frey also

explored network nodes, and the hierarchical structures they form within networks.

Geographers have also contributed to network analysis in related fields. Mitchelson

and Wheeler (1994) illustrated the importance of information flows through the U.S. in

terms of the global economy. Longcore and Rees (1996) have shown the importance of


telecommunication infrastructure in financial districts. Hepworth (1990,1991) has studied

the Geography of the Information Economy. He concluded that IT convergence has led

to the centralization of information activity while communication between locations has

enabled the decentralization of knowledge-creation.

Using FedEx geographic delivery data, Mitchelson and Wheeler (1994) explored

the relationship between information flow and the U.S. city hierarchy within the context of

the global information network. They defined criteria for unstable economic conditions:

deregulation, globalization, demassification, and vertical disintegration as important

factors in promoting instability. These conditions are dependent upon the information

economy, and the exchange of information is critical when instabilities arise. The

Internet is a tool to transfer information, data, and ideas, and acts as a stabilizing force

in the global economy. Mitchelson and Wheeler's analysis can be applied to the Internet

to give insight into information flows and spatial structure of information economy, just as

the FedEx delivery system is used to establish a domestic hierarchy. Longcore and

Rees (1996) built upon the work of Moss (1991,1998), Castells (1989, 1993, 1996, 1997,

1998, 1999), Sassen (1991, 1994, 1995, 1996, 1999, 2000), Dicken (1994, 1998) and

others to study information technology and networks at the local level. Longcore and

Rees (1996) used the financial district in New York City to assess inter-urban information

flows. The found that the decentralization of central city office activity was enabled by

electronic communications and concluded a new urban hierarchy was emerging, based

on inter-urban information flows (Longcore & Rees 1996). However, they also concluded

that only global cities could support a sufficient concentration of telecommunication

infrastructure. The demand for telecommunication infrastructure would exist in larger

cities, implying that infrastructure capacity is reflective of underlying market conditions

and position in the global hierarchy.

More recently, the information and economic flows of e-commerce3 have been

studied. Leinbach and Brunn (2001) address the rapid growth of IT4 sectors that demand

a technically trained and highly skilled workforce. They conclude that the major cost

burden of communication infrastructure has fallen on the shoulders of the private sector.

Button and Taylor (2001) and Kenney and Curry (2001) illustrate the importance of the

Internet and e-commerce. They both conclude that the Internet is an important tool for

reducing transfer costs. Goodchild (Leinbach & Brunn, 2001) addresses the location

theory implications of the Internet and e-commerce and concludes, "The Internet is more

than just another communications device. It is a newly developed space wit the power to

give rise to novel forms of human social interaction in almost any area of human

endeavor, commercial, or otherwise" (Goodchild 2001, as quoted in Leinbach & Brunn

2001, pp. 63-63). Malecki and Gorman (2001) study the physical structure of the

Internet and the importance of geography asserting that the Internet illustrates "both old

and new geographies" (Malecki & Gorman 2001, as quoted in Leinbach & Brunn 2001,

p. 103). Still others have studied e-commerce in firm, regional, and global contexts:

Aoyama (2001), Cobb (2001), Coe and Yeung (2001), von Geenhuizen and Nijkamp

(2001) and Langdale (2001).

Cukier (1998) tackles the geography of the Internet on a global scale, concluding

that in a postmodern world of consumerism and industry, geography matters; but in a

digital economy, information is the main product of value, and connectivity is what really

matters. Recent claims have touted the "death of distance" (Cairncross 1997) in

business. President Bill Clinton's 1998 address to the United Nations proclaimed the

Internet is responsible for "the death of distance," and he asked the United Nations to

support the new technology [the Internet] (Clinton 1998). Dodge and Kitchin (2001) have

3 E-commerce refers to the exchange of goods and services via the Internet.
4 IT refers to Information Technology


continued to stress that geography remains important despite these claims. They detail

a literal, conceptual, and metaphorical mapping of information and communication

technologies and cyberspace and conclude that even in cyberspace, geography matters.

Warf and Purcell (2001) contend the idea that the relevance of geography and location

are pertinent because the Internet compiles and portrays a definite spatial structure that

reinforces existing relations of wealth and power. They acknowledge that "though

deregulation and digitization have severely attenuated the linkages between money and

space" global money does not "presuppose the disappearance of the nation-state, but

rather a rearticulation of its functions" (Warf & Purcell 2001, p. 240).

When users are browsing the web, traveling from site to site and location to

location, geography is, to the user, of little relevance. However, the geography of the

Internet's infrastructure is of great relevance. Wilson (2001) reminds us that seeking

territory in cyberspace has both "metaphorical and real geographic elements." Wheeler,

Aoyama and Warf (2000) have produced a publication that concentrates on the

geographic distribution of telecommunications and discuss how changes and

innovations in the economic system are catalyzed by telecommunication networks. Their

publication includes descriptions of how telecommunications have brought about the

restructuring of cities such as Atlanta, Phoenix, and Sunderland, England. They cover

the geography of Internet real estate, telecommuting, and urban planning and attribute

changes in the economic system to the heavy influence of telecommunication networks.

Recently there have been two ideas about the effects of telecommunications on

cities. The first idea is that information transfer will replace distance, causing the death

of cities (Gilder 1995). "Some social theorists argue that new information technologies

will inevitably lead to the economic decline of cities as electronic communications make

it possible to replace the face-to-face activities that occur in central locations" (Moss

1998). "We are headed for the death of cities" (Gilder 1995). The second idea is a little

more rational, given that cities continue to experience population growth:

Telecommunications technologies are not a replacement for personal interactions, but

an enhancement.

It is also possible that telecommunications are not a substitute for face-to-face
interactions, but in fact these two forms of information transmission are
complements. If they are compliments, then we should expect cities and [selected
urban] space to get more important as information technology improves. (Moss

The implication is that telecommunication infrastructures are likely to reinforce existing

trends rather than create divergent trends.

The analysis of transportation and telephone networks has been an important

research topic in geography for some time. Nonetheless, there has been an absence of

studies on the Internet and Internet infrastructures. Geographic analysis of the Internet

has increased in recent years, though mostly describing the growth of Internet hubs and

capacity and the geographic distribution of networks and the Internet's users and traffic.

Little emphasis has been placed on the complex connectivity of the Internet from a

network analysis standpoint, particularly the Internet backbone network.

Geographic Research of Telecommunication Infrastructures

Geographers have begun to analyze the Internet and telecommunication related

infrastructures; including colocation facilities, Network Access Points (NAP),

Metropolitan Area Exchange's (MAE), Internet Exchange (IX) Points, Marine

cablelandings, Point of Presence (POPs), Internet backbones, fiber routes, cellular

towers. Recent work has also examined wireless structures (Gorman & Mclntee 2003);

Web content; and information production and distribution on the Internet has also been

explored (Zook 2001, Wilson 2002); and the locational attributes of colocation facilities

(Mclntee 2001).

The colocation industry emerged as demand for the interconnection of

telecommunication networks rose dramatically with the growth and proliferation of the

Internet. These facilities serve as physical interconnection hubs for ISPs, Internet

backbones and servers and are known by many nicknames: telehouses, telecom hotels,

and Internet hotels. The location characteristics of other types of Interconnection hubs,

NAPs, MAEs, IX Points, and marine cable landings have also been examined (Mclntee

2001). These interconnection facilities are clustered in cities rich in telecommunication

infrastructure, specifically fiber-optic networks, connecting the networks. Evans-Cowley,

Malecki and Mclntee (2002), Malecki (2002), and Malecki and Mclntee (2003) have

further explored the colocation industry and the geographic location of these facilities

and their effect on urban places and urban structure. Telegeography (2001b) also has

contributed to information of the geographic distribution of the colocation industry by

compiling datasets of colocation facilities.

Point-of-presence (PoPs) facilities are a type of infrastructure that allow for the

connection between local Internet service providers and Internet backbones. Grubesic

and O'Kelly (2002) found that the greater San Francisco area led all U.S. metro areas in

the number of POPs in 2000, likely attributed to the high concentration of Internet

networks housed in the city.

Another important telecommunication infrastructure that has been studied by

geographers is cell towers. Gorman and Mclntee (2003) found a strong and significant

relationship between cell towers and the volume of data traffic and the location of

colocation facilities within a C/MSA implying that market size and urban growth are key

in the location of cell towers. In 2001, 65,000 cell towers existed in the U.S. Of that

number, 41,204 were located in C/MSAs (Gorman & Mclntee 2003).

Web content, hosting facilities, and location of information production have also

been studied in geographic contexts (Malecki 2002, The Economist 2001). Zook (2001)

has explored the physical locations of adult video content providers vs. online content

providers. He concluded that there is a "stronger connection between Internet content

and information-intensive industries than between the Internet and the industries

providing the computer and telecommunications technology necessary for the Internet to

operate" (Zook 2000, pp. 411-412). Wilson (2002) has analyzed the geographic location

of virtual casino domains. He found that in 2001, the U.S. led the distribution of casino

domains, housing roughly 25% of the world's casino sites; but the urban distribution

within the U.S. was widely dispersed compared to the Internet industry at large.

Considerable research has been done into the affects of new communication

technologies on cities and the urban hierarchy. These studies revealed that

communications infrastructure has disproportionately agglomerated in the largest

metropolitan regions (Malecki & Gorman 2001, Malecki & Mclntee 2000, Mclntee 2001,

Moss & Townsend 1998, Wheeler & O'Kelley 1999), a pattern that reinforces the

predominance of those metropolitan areas within the urban hierarchy. This concentration

of infrastructure in the largest cities confirms the theory that telecommunications

infrastructure and technology will not bring the decline of cities, but rather complement

cities in their attempt to stay viable as centers of commerce. Although past research on

telecommunications includes the networks that comprise the Internet, little is presented

other than the topology of the nodes and links (Hepworth 1990, Kellerman 1993, Malecki

& Gorman 2001). Topics of importance are included in Brunn and Leinbach's (1991)

Collapsing Space and Time: Geographic Aspects of Communication and Information:

geography and communications, information economies, communications, technologies,

and regional development, and social dimensions of information and communications.

Longcore and Rees (1996) studied city structure change, influenced by the most recent

changes in information technology, using Manhattan as a case study (Longcore & Rees

1996). Although their research concluded that the financial district land market might

cause the tightly focused financial district to demonstrate geographical flexibility, they

recognize the importance of face-to-face contact, and proximity to sophisticated

telecommunications infrastructure (Longcore & Rees 1996).

Geographic Research of the Internet Backbone

Within the past decade various researchers & organizations have looked at the

Internet backbones. Most of this research has focused on the Internet backbones in the

U.S., though some has discussed the structure of the Internet in Europe. The following

section reviews geographic literature of the Internet backbone network.

Telegeography (2000, 2001) reported that the international backbones with the

largest bandwidth capacity in 2000 were not surprisingly located between London and

New York (26680.5 mbps) and between London and Paris (24340.5 mbps5). The

backbone link between San Francisco and Tokyo had the largest bandwidth capacity

between North America and Asia in 2000 (capacity 7550.0 mbps) (Telegeography 2000,

2001). Telegeography (2001) has also reported that New York serves as the Internet's

most global metro area, directly connected to 71 countries in 2001. Five of the top-ten

cities cited in that report were intercontinental backbones located in the U.S. This

reinforces the theory that Internet traffic is heavily reliant upon the U.S. as a centrally

located switching hub in the global telecommunications network.

Between 1997 and 1999, the U.S. experienced large, rapid growth in the Internet

backbone network with a 420% increase in data transfer capacity (Moss & Townsend

2000). Moss and Townsend (2000) also reported that they found an increasing

concentration of Internet backbones in several mid-sized locations that were centrally

located. World Cor, Sprint, and Cable & Wireless dominated the Internet backbone

network in 1999, controlling about 55% of the domestic market (Telegeography 2000).

By 2001, with mergers and acquisitions flooding the market, World Com controlled 37%

of the domestic market (O'Kelly & Grubesic 2002). O'Kelly and Grubesic (2002) found

5 Mbps- Million bits per second. A measurement of data transfer rate.

that East coast cities in the U S. experienced a high concentration of Internet bandwidth,

a phenomenon that begins to slowly diffuse westward. They attributed the high

concentration of links and bandwidth in Washington, D.C. to the combination of its role

as a capital city and it's high-tech industry. They also concluded that Chicago was the

most- accessible city in the U.S. backbone network based on the fact it had more

Internet connections or pathways between it and every other U.S city.

Wheeler and O'Kelly (1999) studied the accessibility levels of 31 Internet

backbones in 1997. They found that Washington, D.C., Chicago, San Francisco, New

York, and Dallas led the U.S. in the most accessible cities in the Internet backbone

network. Malecki and Gorman used connectivity matrices to determine U.S. city

hierarchies based on 1-hop links and 2-hop links in the bandwidth-weighted matrix. They

concluded that network analysis of the Internet illustrated old and new geographies; the

Internet has changed the meaning of distance, space and r-e ;e..-:grapr.i:ai significance

of places, following old routes while also establishing new ones (Malecki & Gorman 2001

in Brunn & Leinbach, p. 103). The use of binary connectivity matrices confirmed the

"strong spatial bias and hierarchical structure of U.S. cities-one that differs from the

conventional population-based hierarchy" (Malecki & Gorman 2001 in Brunn & Leinbach

p. 103). Malecki and Gorman found that the major cities in the economy double as the

major nodes of the Internet.

History of the Internet and Parallel Developments in Telecommunications

Early Communication Networks

Communication networks date back to antiquity. The story of Phidippides, who ran

in 490 B.C. 36.2 km from Marathon to Athens to warn the Athenians of an approaching

army is one of the earliest examples of a communication network (Holzman & Pearson

1995, p.1). Early communication systems included the Pony Express (1860-1861),

pigeons (as early as 776 B.C.-still used in 1981 by an engineering group in California),


mirrors and flags, fire beacons, watchman and senators. The first telegraphic device was

reportedly around 350 B.C., when a rudimentary device using fire signals to direct flow of

water in Italy. There followed a two-thousand year gap in telegraphic devices until the

telescope as invented in 1608 by Hans Lippershey (Holzman & Pearson 1995, p. 31).

More modern telecommunication closely related to today's communication networks

arrived in 1844 with the invention of the telegraph (Hugill 1999, Wilson & Corey 2000).

A Brief History of Telecommunications

The telegraph is described as the earliest ancestor of the Internet; like most

communication technologies such as the telephone and the Fax machine, the Internet

has been built upon the foundation of the telegraph (Standage 1998). The telegraph was

the first in a long series of inventions and technologies designed to exchange

information electronically (Lebow 1995) The printing press cannot be overlooked,

however, as it allowed the first type of "one-to-many" communication and introduced a

mass-produced format that allowed for fairly rapid exchange of information and data.

Telecommunications changed little until networked computers allowed "many-to-many"

communications (Malecki 2002). The many-to-many communication has been both an

aid and hindrance. For example, users are able to e-mail multiple recipients with news or

information. At the same time, users are subject to annoying e-mail and advertisements

coined as "spam."6 The media are constantly informing us that we are in the midst of a

communications revolution due to rapidly changing information transfer technologies. It

may be relevant to acknowledge that the electric telegraph was a far more disruptive

technology to its era than the Internet has been to us (Standage 1998). The printing

press and telegraph are most credited to having an impact as significant as the Internet:

the printing press and the telegraph (Malecki 2002).

6 Spam is the term used to describe unsolicited "junk" e-mail sent to large numbers of people
to promote products or services.

Two notable differences in telecommunications and transportation are apparent

since the arrival of the telegraph: moving intangible goods, data, and information are not

the same as moving tangible goods (Hillis 1998). In addition, telecommunications is to

function as a network with simultaneous utilization by many users sending and receiving

such intangibles (Rosenberg 1994), making telecommunications a great influence on

business. In short, business began to use the new technology to exchange information

without involving physical movement of man or animal to do so. Banks and financial

corporations were the first types of businesses to take advantage of this new technology

(Beniger 1986, Gabel 1996). Telecommunications revolutionized interaction between

individuals and institutions as well as created a new platform for networks and

networked systems.

Communication technologies have become increasingly bundled in recent years.

Many types of communication devices are being developed to serve multiple purposes

with maximum convenience. Kellerman (2002, p. 15) describes how it has become

possible to use the computer as a telephone, fax and TV, and receive several of these

services from a single service provider. He muses that "this fusion may possible mature

into a single appliance for information consumption and production, as well as so-called

public networks of data and software" (Halal 1993, as cited in Kellerman 2002, p. 15).

Phone companies such as Sprint and Verizon are now offering cellular phones that have

the capability of wireless Internet access, fax, photos, and more (www.sprint.com,

www.verizon.com). Phone companies are also offering direct service lines (DSL) in

addition to regular phone service. Cable companies have also expanded their services

to compete in the new high-speed Internet access industry. Media conglomerate Time

Warner's cable division recently introduced 'Roadrunner' to compete in the Internet

service provider market. Roadrunner is a -irn -...:,ee, ., online service providing unique


broadband7 content, services, and lightning-fast access to the Internet. Road Runner is

delivered to your computer over the same upgraded cable systems that currently bring

cable television to the home (http://www.roadrunner.com). Today, many industry

analysts predict that with the growth of data networks, voice traffic will increasingly travel

over Internet protocol (IP) technology. With increasing data traffic, the demand for

Internet fiber is on the rise (National Research Council 2001).

Telecommunications and the City

Location Decision and Telecommunications

Telecommunications technology has been booming since the early 1970s.

Prominent advancement and change in telecommunications technology have influenced

location decisions in business. Different types of firms are scrambling to locate in areas

rich in technology infrastructure Traditional location theory typically includes: local input

and output, transferable inputs and outputs, climate, labor supply, taxes, and local

economy. Many firms continue to use traditional location factors but are beginning to

incorporate new factors into location decision, especially high-bandwidth Internet

connection. These firms include those involved in banking, research, marketing,

telecommunications, and many more. High-bandwidth connectivity is an increasing

attractive asset to firms' location decision but at the same time, the firm's location is

pertinent. The assumption that a firm can ignore geographic location because of

technology is false. Geographic location is still an important location factor for many

firms. Firms dependent upon technology search out locations that have strong

technological infrastructure. This infrastructure might include: switches, POPs (point of

presence), NAPs, Gateways, etc.

' Broadband describes a transmission facility having a bandwidth sufficient to carry
multiple voice, video or data channels simultaneously.

The presence of modern telecommunication technology reduces transfer and

information acquisition costs. Transfer of data is dramatically reduced in terms of time.

Transfer costs of physical commodities and production inputs for assembly have

traditionally been a main factor of location decision, and still are. However, the actual

transfer of some goods has changed given the switch in emphasis to information

services and intangibles. The transportation of data, electronic mail, music, movies, and

news are not physically transferred. These goods can be transferred electronically.

Other goods (such as food, clothing, bicycles, and tangible goods) must be physically

transported. Many firms use both types of transfer. For example, a company that sells

women's apparel. This company might be involved in e-commerce, but the final clothing

item must be shipped to the customer. So geographic location continues to be an

important factor in this firm's location. The firm must be "connected" technologically,

while at the same time able ship goods to customers at minimum costs.

Telecommunication networks are "friction reducing" technologies, that enable transfers

between remote locations for costs that are substantially lower than the physical transfer

of information between them" (Salomon 1988) via human interaction or hard copy, etc.

Whittaker Associates has identified the most important site-selection factors to the

Business Services industry. Of the 51 factors studied, the top-ten include the following:

Telecommunication services
S Secondary education quality
Effective cost of skilled labor
S Effective cost of unskilled labor
S Availability of executive, administrative, managerial workers
Administrative support
Geographic proximity to markets
S Access to business & tech. Services
S Business taxes
Energy dependability

Telecommunications technology will increase the potential of cities, and has in fact,

since the 1980s, revitalized the central business districts of the leading cities and

international business centers of the world-New York, Los Angeles, London, Tokyo,

Paris, Frankfurt, Sao Paulo, Hong Kong, and Sydney. These cities have reached their

highest density of firms ever, providing further evidence that cities are not on the decline

(Sassen 2000). With the knowledge of the benefits sophisticated telecommunication

technology can bring to urban centers, many cities are welcoming infrastructure.

Sophisticated technology can benefit the city by attracting those firms seeking the

infrastructure, such as financial, media, and web-based firms, which in turn provides

jobs, education, and services for those who have access to the infrastructure. Those

who do not have access to the technology are at a disadvantage. Telecommunication

technology is changing the economies of networks, emphasizing service industry rather

than manufacturing.

Many firms have included technological infrastructure as important factors in their

location decision factors. The highest level of advanced telecommunications

infrastructure is found in the country's largest cities; implying that cities that are better

connected with sophisticated telecommunications infrastructure are a more attractive

location. These high urban centers are able to cash in the technological amenities they

offer. Many public services such as libraries, tax and finance administrations, and

criminal justice systems are information intensive, dependent upon computers,

telephones, and sophisticated information retrieval and imaging systems. "A city's future

as an information center depends on information-producing activities that occur through

both face-to-face and electronic communications" (Moss 1998).

Infrastructure in Cities

The intra-urban patterns of telecommunication infrastructure are greatly dependent

upon each other. For example, the physical structures of the fiber-optic networks on the

ground are greatly dependent upon interconnection facilities. At the same time,

colocation facility location is just as dependent upon the concentration of fiber networks.

The same cities that lead bandwidth rank also lead colocation rank (Mclntee 2001).

Cities that lead the U.S. in terms of bandwidth concentration consecutively lead the U.S.

in terms of colocation facility concentration. The relationship between different types of

Internet infrastructure is a simile to the figure of speech "which came first, the chicken or

the egg?" The relationship between colocation facilities and Internet bandwidth is no

exception. It is difficult to determine which is more dependent upon the other, as

colocation facilities are interconnection points for Internet backbones, while at the same

time, colocation facilities locate in close proximity to termination points of Internet


The Internet has sparked a concern for new legislation and policy to help protect

those who use the Internet as well as those who are affected by the Internet and its

infrastructure. There are many concerns with the effects of the Internet, and its

infrastructure location within metropolitan areas. The negative effects can be blanketed

under one term: the digital divide (Sassen 2000, Wilson & Corey 2000, Wheeler,

Aoyama & Warf 2000). The digital divide is a simple definition for the gap between the

rich and the poor widening as those with wealth and affluence have access to the

increasingly valuable advantages of sophisticated technology, such as the Internet,

while the poor are further disadvantaged because they do not have equal access to this

valuable tool.

Those firms that benefit greatly from Internet infrastructure especially those

involved in financial services, media, consulting, are using infrastructure as an

increasingly important factor in location decision (Finnie 1998, Kotval 1999, Longcore &

Rees 1996). Communities have been using their telecommunications infrastructure as a

strategy to attract new business and to increase their overall economic competitiveness.

In a European survey of 500 companies, telecommunications was cited as the second-

most important factor (Graham & Marvin as cited in Kotval 1999). Cities such as New

York, Boston, and Amsterdam have earned a competitive advantage by establishing

teleportss" (satellite linkages connecting to local telecommunication networks), and they

effectively result in a globally networked city. For those who do not have access to the

technology, the ever-growing sophisticated infrastructure could create problems.

Universal access to information and communication technologies is critical in closing the

gap between the economically disadvantaged social groups and the advantaged groups.

Graham (1999) is already describing the unfavored zones within cities as "network

ghettos," places of low telecommunications access and concentrated social

disadvantages. "Uneven global interconnection via advanced telecommunications

becomes subtly combined with local disconnection in the production of urban space"

(Graham 1999). Those who could benefit most from the infrastructure as a tool to

enhance quality of life, job searches, education, and communication, may be those who

are excluded from the sophisticated technology.

Poor and less-advantaged cities that are reluctant to welcome Internet

infrastructure and telecommunication competition may be disadvantaging themselves.

Finnie (1998) studied 25 major cities, and determined that global cities that remain

competitive in attracting business firms lack strict regulation within the

telecommunication sector. According to Finnie, telecommunications services are

becoming increasingly central to business success or failure; as competition increases

and sophisticated technology becomes more widely available. As a result, the gap

between the haves and the have-nots could well be narrowing. Because of all the

benefits cities receive when Internet infrastructure is implemented, it is not feasible to

ban growth of these sophisticated networks within urban areas.

Kotkin (2000) has claimed that the digital era we are currently experiencing is a

period of advancement not seen since the industrial revolution. It is hard to argue this

statement. The dawn of the Net has impacted the economic and social geography of


America largely, and some believe it is redefining the American city hierarchy (Kellerman

2002, Kotkin 2000, Townsend 2001). Numerous titles have been devoted to the digital

revolution, the Internet, and an increasing interconnectedness of our world. Six Degrees

(Watts 2003) and Linked (Albert & Barabasi 2002a) discuss the overlap and

interconnection of networks in modern day life. Some texts, such as Information

Tectonics (Wilson & Corey 2000), The New Geography (Kotkin 2000), and Worlds of E-

Commerce (Leinbach & Brunn 2001) explore the economic, geographic, and social

implications of the digital revolution from the perspective of various disciplines.

There is growing literature on the Internet, it's history and composition, and the

effect of proliferating telecommunication networks on cities, complex regional networks.

This research helps to provide a perspective for examining the U.S. Internet backbone

network. This dissertation will contribute to that literature by analyzing the connective

properties of the U.S. Internet backbone network.

Data and Methodology

Internet backbone data have been obtained from George Mason University School

of Public Policy. The dataset was created by researchers at George Mason University in

2003. The Internet backbone dataset provides a measure of the amount of data capacity

and connections a consolidated metropolitan statistical area (C/MSA) has to move

information to another C/MSA. The data was calculated from the total long haul fiber

capacity, or bandwidth, connecting a C/MSA to other C/MSAs. Bandwidth is the term

used to describe transmission speed, which is measured in bits per second. According

to Malecki and Gorman (2001), "bandwidth is what makes communications- specifically

Internet Protocol (IP)-different from transport networks. The limiting factor of IP

networks is not distance, but the capacity of the bandwidth available on the network from

one location to another" (p. 90). The normal speed of a voice call is 64 kbps.

Transmission speeds above 64 kbps is generally categorized as broadband (Huston

1999, pp. 160-171. The Internet backbone providers are companies that own the

framework of the Internet. This network connects CMSA nodes and transports data

across long geographic distances. Multiplexing is the process of sending multiple signal

streams of information on a backbone at the same time in the form of a single signal. By

multiplexing, higher bandwidths can be achieved. In 2001, 48 private providers operated

the Internet backbone networks. These firms range from large telecommunication

carriers such as AT&T, MCI WorldCom, Sprint, IBM, and Cable & Wireless, to smaller,

lesser-known firms. The backbone providers are called autonomous systems (AS),

which means they operate independently from other systems, setting their own policies

and network structure (Malecki & Gorman 2001, pp. 92-93). These independent

networks interconnect to form a larger network, thus creating the Internet backbone. In

1995 Huitema suggested that the Internet backbone network is the best indicator of the

geography of the Internet. The amount of bandwidth between C/MSAs is not equal.

Chapter 3 analyzes the network as though the bandwidth connections are equal, while

Chapter 4 address the inequalities by adding weights to the analysis.

A geographic information system (GIS) was created with the Internet backbone

network data, using ArcGIS from Environmental Systems Research Institute (ESRI).

Several types of telecommunications data with different characteristics and capabilities

were also incorporated into the GIS. The GIS were statistically analyzed to determine

spatial relationships between the Internet backbone network and to understand the

distribution of Internet bandwidth. The telecommunication infrastructure were geo-coded,

giving each datum spatial attributes so it can be analyzed in conjunction with other types

of data in a common information system. Telecommunication infrastructure data

obtained for this research project include the following:

S Telephone Switches (Digital, Wireless)
Cellular Towers (Towers, Antennas)
S Fiber Lit Buildings

S Colocation Facilities
S Network Access Points (NAPs)/Metropolitan/Area Exchanges (MAE)
S Marine Cablelandings
S Fiber Points of Presence (Pops, termination points)

The following is a description of the data used in the GIS and statistical analysis

performed in Chapter 6:

Telephone switches. This database includes the location, capability, and

ownership of telephone switches. Description includes details such as which switches

are wireless, dq1gi1l, or integrated services digital network (ISDN). Customers might

purchase a single type of telephone switch they are interested in, such as wireless

switches. The data is geocoded and can be used in a geographic information system.

Cellular towers. A complete database of cellular towers in the United States

including the tower owner and capabilities of the structure. Data is also available on the

auction results of metropolitan areas. Auctions provide an economic valuation of regions

by private industry for the implementation of a technology. The auction values of a

region provide a new insight into how emerging technologies are affecting the urban

hierarchy of regions.

Fiber lit building. This is a very extensive database that includes numerous fiber

carriers. The description includes carriers, street addresses of termination point, type of

fiber, capacity and status of fiber, common language location (CLLI) codes, and

geocoded. This data is geo-coded and address matched, determining the building and

spatial attributes of the fiber location. This database includes dark fiber, lit fiber, fiber

currently in existence, fiber "in the pipe," as well as network expansion details of future

fiber locations. Fiber loops are also included in this database as are carrier "lit" buildings

and metro fiber routes. (Source Geo-tel 2002).

Internet interconnection facilitieslcolocation facilities. These are provision

network providers with floor space for their network equipment within a secure building.

The building is typically equipped with appropriate heating, ventilation and air-

conditioning (HVAC), enhanced fire suppression, electrical connections and diesel-

powered generators to guard against commercial power failures. Private companies

often choose to interconnect in these facilities, as well, to avoid costly local loop charges

and to have the ability to cross connect to their carrier of choice. Companies also

collocate to utilize multiple carriers so if their primary carriers' network fails, they can

reroute network traffic to the back up carrier already in place.

Network access points (NAP), metropolitan area exchanges (MAE), Internet

exchange (ix) facilities, carrier hotels (Geo-tel 2002). These are Internet

interconnection facilities on a grand scale. Some of these facilities are considered

"public" interconnection facilities, while others (MAE) are privately held. Many of these

mega facilities were original interconnection points for the early Internet.

Marine cablelandings. This data shows the exact location and termination of the

marine cableheads and lists the carriers located in the cable. A description of each

carrier's fiber in the marine pipe is also available. The spatial attributes for this data are

also included in the database (Source Geo-tel 2002).

Point-of-presence data (POP). This data includes the carrier of the POP and the

spatial attributes of the POP. This data set consists of carrier fiber points of presence

(POPs) that signify termination of fiber lines that provide connectivity to a location,

typically office buildings (Geo-tel 2002).

In addition to telecommunication infrastructure data, a database with descriptive

statistics of the metropolitan areas was compiled. The database included population,

bank deposits, income, and the local economy's dependence on specific sectors, such

as finance, insurance, and real estate (FIRE), as well as other factors that could be used

as interactive variables in the model. The descriptive data will be discussed more in-

depth in Chapter 6.


The data have been modeled in a geographic information system (GIS), using

ESRI's ArcGIS software to study the distribution of telecommunication infrastructure in

conjunction with like-kind as well as to study the hierarchy of nodes. Chapters 4 and 5 of

this dissertation use the long-haul fiber optic bandwidth data for the analysis performed

within. Chapter 4 uses the unweighted long-haul fiber optic bandwidth data for the

unweighted analysis. Chapter 5 adds weights to the analysis. Chapters 4 and 5 include

maps that display the long-haul fiber optic bandwidth data with other types of

infrastructure. However, the analysis for these chapters was performed solely using the

fiber network data. The analysis in Chapter 6 analyzes the Internet backbone data in

conjunction with various types of telecommunication infrastructure data as well. The

procedures performed in Chapter 6 also incorporate the descriptive data for C/MSAs into

the analysis.



The matrix-based frameworks for analyzing the overall impact of nodal or linkage

distribution on the overall connective properties of a network has not been applied to the

U.S. Internet backbone network. This research intends to identify the most critical links

and nodes in the domestic long-haul fiber network in the U.S. The research methods

contained within this chapter can then be applied to other types of telecommunication

networks to answer like-kind research questions.

Examining telecommunication networks as a graph has proven to be highly useful

in answering the proposed research questions and testing the hypotheses. This chapter

introduces and explains graph theory and matrix multiplication. The methods reviewed

here will provide a framework for network analysis and will be used to examine the

Internet backbone network in Chapters 4 and 5. The purpose of this chapter is to provide

a through explanation of graph theory and matrix multiplication that will be used in the

following chapters of this dissertation.

Introduction to Graph Theory

A fundamental question in network analysis is the degree to which the nodes are

interconnected. The connectivity of a network is defined by the overall degree of

connection between all vertices. The degree of connection between all vertices is

probably the most important structural property of the network (Taffe & Gauthier 1973,

p. 101). This section of methodologies will be based largely on Taaffe & Gauthier's 1973

publication, Geography of Transportation. Any network can be represented as a graph


(Haggett, Cliff & Frey 1977, Garrison 1960, Kansky 1963).As spatial structures, networks

are extremely complex in nature. This makes networks difficult both to describe and

analyze. By simplifying networks, we are able to study their characteristics. When

applying graph theory to a network in order to analyze it, it is necessary to model the

network in the form of a graph. As the network is simplified in analysis preparation, some

of the information about the network will be discarded purposely. Only those pieces of

information that are most relevant to analysis when using graph theory are taken into

account. Noting this, not all networks should be described in terms of graph or matrix

theory. Topological analysis takes into account only interconnection, excluding

properties such as shape, direction, and size. When the network is studied as a graph,

only the topological properties of the network are considered. The large range of

characteristics that might be identifiable with various networks are not analyzed in graph


Graph theory breaks the network into points and lines, in an abstract manner.

Although it does not model the real world directly, it provides measurement for some

structural properties of "a real-world system if that system is idealized as a set of points

connected by a set of lines" (Taaffe & Gauthier 1973, p. 101). In the simplest form,

networks can be represented by a series of vertices (representing nodes) and a series of

edges (representing links), with a relationship of incidence that associates each edge

with two vertices. We know only the presence or absence of connections between nodes

are given for each pair of nodes and represented in graph form. There are two ways to

measure the described network: (1) a single number (2) a vector of number. A single

number describes the aggregate geometrical pattern of the network, while the vector of

numbers measures the relationship of the individual components of the network to the

entire network (p. 101).

There is minimal information given about this network (see Figure 3-1, Hypothetical

Network A), so only primitive measures of connectivity can be assumed. The node-link

relationships are the only information given to derive conclusions about connectivity.

Figure 3-1. Network A

The first measurements that can be taken are the number of links and nodes. In

Figure 3-1, There are 8 nodes(v) and 7 links(e) in the network. Moreover, the network is

minimally connected as there is only one link between any two pairs of nodes. Note that

there are no redundant links within the network, meaning that no node has more than

one direct connection to any other node. Redundances occur when more than one link

connects the same two places. With any minimally connected network the number of

links is always one less than the number of nodes: e ,,,= (v-) = (8-1) = 7. Note that

removing any link in this network will disconnect the network into two parts.

Because network connectivity is most meaningful when a network is either

compared to another network or used in measuring growth, another hypothetical network

(B) is shown as an example for comparison (Figure 3-2).

Network B is more complex than Network A. Network B has 8 nodes and 11 links.

This network is more than minimally connected. Most of the nodes in this network are

connected to more than one node. When this type of structure exists, the removal of one

link will not necessarily disconnect the entire network.

Figure 3-2. Network B

In order to compare these two networks (A & B), connectivity measures must be

employed. Graph theory provides various simple measures. The most often employed

measures include the gamma and alpha indices.

The Gamma Index

The Gamma index is the ratio of the number of edges in a network to the maximum
uctualedge s e
number possible in that network: y = --cua
max edges e max
The number of links in examples A & B can be obtained from counting. There are 7

links in example A and 11 links in example B. The number of possible links (e ,,) can be

computed from the number of nodes in the system. If the network is represented as a

planar graph (one where intersections occur only at nodes), the addition of each node to

the system increases the maximum number of links by 3. This holds true for any planar'

network of more than two nodes. Because the graph is planar, the intersection of new

links is not possible without the addition of a new vertex. To express e ,, use 3(v-2).
e e
The gamma index then becomes y = -
emax 3(v-2)
When using the gamma index to determine maximal connectivity, the relationship

between the number of nodes (t) in a networks and the maximum number of links (c), we

would use e = 3(t-2). The gamma index is expressed in terms of a graph-theoretic range

that varies from a set of node that have no interconnections, while on the other end of

'Planar networks form vertices whenever two edges cross, where non-planar
networks can have edges cross and not form vertices.

the spectrum we have a set of nodes in which every node has a link that connects it to

every other node in the network. The connectivity "is evaluated in terms of the degree to

which the network deviates from an unconnected graph and approximates a maximally

connected one" (Taaffe & Gauthier 1973) The gamma index falls between a range of 0

and 1. Using network A as an example, the gamma index would be
e 7 7
S- -- 7- = 39. In network B, the gamma index would be
emax 3(8-2) 18
r e 11 61. In terms of maximal connectivity, the first network is 39%
emax 3(8- 2) 18
connected while the second network is 61% connected.

Alpha Index

When discussing minimal networks, the possibility of linkage removal was

discussed. This would sever the connectivity of the network into two separate pieces.

Linkages can also be added to a network, thus increasing the connectivity beyond the

minimal structure, adding redundancy and/or alternate paths. Additional linkages create

circuits. Circuits can be defined as a definite path where the original node of the linkage

sequence coincides with the terminal node. If a circuit is present, then it establishes

additional or alternate paths in the network. The number of linkages that are added to

the minimal network defines the number of alternative paths. The max number of

independent circuits in a network is also a function of the number of nodes in the

network and the number of linkages necessary for minimal connection between nodes.

The alpha index is a ratio measure of actual circuits, given by (I-n+1), to the

maximum number possible in a given network. In a connected network where links are e

and nodes are v, the number of links is equal to one less than the number of nodes

(e=v- 1), only when the network is connected minimally. When there is a circuit in the

network, the number of links is greater than the (v-1); e > v-1. By subtracting the number

of links that are needed for a minimally connected network (v-1) from the actual number

of nodes (v), we can obtain the number of circuits in the network. According to Taaffe

and Gauthier (1973), this can be expressed by e (v-l) = e v + 1. The resultant is a

measure of the number of independent circuits in the network. The maximum number of

independent circuits is also a function of the number of nodes in the network and the

number of linkages necessary to maintain minimal connectivity between nodes. For a

planar network, the maximum number of links is 3(v-2), thus the maximum number of

circuits would be: 3(v-2) (v-1) = 2v 5. The alpha index is a ratio measure of the

number of actual circuits (e-v+1), to the maximum number of possible in a given network

(2v-5): actual circuits e-v +1

max circuits 2v 5

The range of the index is from a value of 0 for a minimally connected network, to a

value of 1 for a maximally connected network. For the sake of convenience, the

numerical value may be expressed as a percentage of circuitry in a network. The alpha

values for network A is
actual circuits e-v +1 0
=----- = ----=- =
max circuits 2v-5 11

The alpha values for network B is

actual circuits e-v+1 4
a= 36
max circuits 2v -5 11

The first network exhibits no circuitry In the second network, the maximum

possible number of circuits is 11, but there are only 4 circuits. The second network's

circuitry is 36% of the maximum.

It was mentioned previously that graph-theoretic indices of connectivity are also

useful for measuring network growth or change through time As an example, let us

consider that our example networks, A and B are an idealized sequence for transport

development. For the purpose of explanation, two more networks will be added: C and D

(Figures 3-3 & 3-4). Network C will represent the network prior to A (Figures 3-3).

Network D will represent the final network (Figure 3-4). Network C will have less

connections than either A or B, as D will have more connections than either A, B, or C

(Figures 3-1, 3-2, 3-3 & 3-4).


Figure 3-3. Network C Figure 3-4. Network D

In the beginning stage, which is illustrated by Network C, there are a few links

leading to interior centers. In the next state, illustrated by Network A, growth is evident.

The network has expanded and includes all of the region's nodes. The growth process

continues in Network B (Figure 3-2). Network D is an example of a mature network

(Figure 3-4) The number of nodes has remained constant dunng the network's growth,

but the connectivity of the network has changed. By using the alpha and gamma indices

we can determine to what degree the network connectivity has changed, as well as

identify the change in the network's spatial structure.

Figure 3-5. Stages of network development

If we arrange the indices in a table, we can see that as the network grows, the

connectivity index increases. As the network becomes more structurally complex, both

the gamma and alpha indices increase.

Table 3-1. Structural indices for sequence of network development
Y a
Stage 1, Network C .22
Stage 2, Network A 39 .0
Stage 3, Network B .61 .36
Stage 4, Network D 78 .63

Three basic network configurations are used to relate the gamma and alpha

indices to more specific network characteristics: spinal, grid, and delta. The spinal

network shares the characteristics of a minimally connected network, every node is

connected to at least one other node and traffic can flow between the nodes but by only

a single path. The number of links necessary for a minimally connected network is

always one less than the number of nodes in the network (v-1). We can conclude that
e v-1
the gamma index 7 3(v 2), for a minimally connected network will be 7 = 3--
3(v --2)
The alpha index a = 0, because there are no circuits in a minimal network.
The spinal network is illustrated as ('-I)-v+l 0 .
2v-5 2v-5
The delta network composition is a stark contrast to the spinal network. The delta

network is comprised of a high density of linkages in relation to the number of nodes

The delta network composition is one of numerous paths, sequences or links achieving

maximal connectivity. The shape pattern most dominant in the delta network is the

triangle, for each set of 3 nodes. When a node is added to a network, of more than three

nodes, two new links are required. The relationship will always remain 2v-3. Since this
e 2v-3
relationship will remain constant, the gamma index will be Y -- 3 -v-2 The alpha
3(v 2) 3(v-2)
index e -v1 will always be a (2v-3)-v+1 v 2
2v-5 2v-5 2v-5

The third type of network ci:;r.igu aiior is the grid. The grid network represents the

transition network that is sandwiched between the spinal network and delta network. It is

a medium between the minimal and maximal network.

To categorize a transport network as spinal, delta, or grid, cutoff values must be

established. By determining a scale of alpha and gamma indices for each of the network

types, the ranges can be established for each category.

The largest and smallest gamma values are used to identify a spinal network.

Taaffe and Gauthier express the gamma index for a spinal network configuration as
v1 Alternate expressions given include: v- and 2Lj
3(v -2) 3 jv-2 2) v-
Taaffe and Gauthier suggest v for networks containing a large number of nodes
will approach 1 and that 1 will approach zero. In sum, this means that the expression
will approach 1/3 of (1-0). At the lower end, the value of the gamma index will be 1/2.

This means that spinal networks can be categorized between a range of values from 1/3

and 1/2, or .333-.5.

For the delta network configuration, housing maximal connectivity, the gamma

index is 2v-3 or 1 )(2v -3 Delta networks will have a range of values between 2/3
3(v-2) 3 v- 2
and 1.0. Table 3.2 shows the range of values for the three classical networks


Table 3-2. Range of values for the delta index for three classical network patterns
Spinal 1/3 y 1/2 where v > 4
Grid 1/2 < <2/3 v> 4
Delta 2/3 < < 1.0 3

Considering the network examples A-D. we can see how the network has changed

from a spinal network to a delta network. In the first stage of the network, the gamma

value is .22. This is expected, as the nodes are isolated, demonstrating minimal

connectivity. In the second stage the gamma value is higher, .39. The network is more

connected as the gamma value indicates. The third stage finds the connectivity value


even higher, at .61. The fourth and final stage of the network has a gamma value of .78.

By this stage, the network displays clearly the triangle configuration that is characteristic

of the delta network.

The alpha index can also be used to define network configuration. As discussed

previously, the absence of circuits means the alpha index value will be zero. Thus, the

alpha value for a spinal network will be zero. The alpha value range for grid or delta

networks depends on how many circuits exist. By defining limits of the alpha range it will

be possible to determine the network c.:- ';ural~.-, As defined by Taaffe and Gauthier,

the delta configuration for the alpha index is a 2v -3)-v +1 -2 The alpha index
2v 5 2v- 5
ranges for the three classical network patterns are presented in Table 3-3.

Table 3-3. Range of Values for the alpha index for three classical network patterns
Spinal C = 0 where v = e + 1
Grid 0 < <.50 v >3
Delta 50
The two indices, gamma and alpha, complement each other in network

measurement. As a network increases in spatial complexity. the change in indices will

be similar. Looking again at the example network, the second stage of the network

shows corresponds with the spinal network. This is consistent with the alpha range for

spinal networks, which is 0 for the network's second stage. The alpha value for the third

stage is .36, categorizing it as a grid network The gamma value for the third stage is

.61, also denoting a grid network. The fourth stage of the network both values fall under

the delta configuration. The alpha value is .63 and the gamma value is .78 for the fourth

stage. The gamma and alpha indices measure network connectivity and circuitry to

describe the network.

Five measures of graph-theory were introduced in the preceding text: number of

nodes, number of links, alpha index, gamma index, number of circuits. Diameter is a

sixth measure, which has not been defined. Diameter is a measure of the span of

transportation networks, defined as the minimum number of links that are required to

connect the farthest two nodes of a network. The diameter describes the minimum

number of links required to connect the most distant nodes in a network. Hence, a

diameter of five would indicate that there are at least five links separating any two nodes

in a network. For example, in network B (Figure 3-2) the diameter is 4. There are only

four links separating any two nodes in Figure 3-2. While these indices are useful

descriptive tools, it is necessary to remember that graph theory does not include many

complexities in practice. In short, graph theory simplifies a network and analyzes it's


Connectivity Matrix

By measuring the accessibility of a node we can determine the hierarchy or the

system of competition that may exist between the nodes in a given network. The addition

of linkages or the destruction or removal of nodes or linkages is to affect the entire

network. Changes of this type also reflect changes in the accessibility or hierarchy of

nodes. Graph theory is used to measure these changes as well as to determine if a

hierarchy or system of nodes exists as defined in terms of connectivity. Just as any

network can be represented as a graph, a matrix can also represent any network. By

representing a network as a matrix, numerous questions concerning the network's

accessibility can be answered.

Traditionally the origin nodes of a network are represented in the horizontal rows of

a matrix and the destination nodes are represented in the vertical columns. The number

of rows and columns in a matnx must be identical. Relationships between the nodes are

represented by corresponding cells. The points (nodes) of the graph are labeled and the

labels are used to identify both rows and columns of the matrix When two points in the

network are connected, this link is represented by placing a non-zero number (typically a

value of 1 in the case of a binary connectivity matrix at the intersection of the relevant

row and column If there is no connection, a zero is placed at the intersection. Figure

Table 3-4 represents a network as a graph and it's matrix format. The number of

nodes in the network illustrated is represented by both the number of columns, and the

number of rows in the matrix. The presence of non-zero number represents a link

between corresponding nodes and the absence of a direct link is represented by a zero.

Also, the connection of a node to itself has no value, so a zero is recorded in the

corresponding cells. For example, the cell at the intersection of row 2 and column 2

contains a zero. This matrix only gives information on the presence or absence of a

direct connection between nodes.

Table 3-4 Example of a binary connectivity matrix

Albuquerque 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 00
Atlanta 0 0 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0 1 0 1 1
Birmingham 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Boise 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
Buffalo 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0

Charlotte 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 1
Cincinnati 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1
Cleveland 0 1 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 1
Chicagoouston 1 1 0 0 0 0 1 1 1 1 1 0 1 00 1 0 1
Cincinnati 0 0 0 0 0 01 0 1 0 0 0 0 0 1 0 1
Cleveland 0 1 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0 1
Houston 1 1 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0 1 0 1 1
KansasCity 0 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 0 1 1 0 1
Las Vegas 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0
LosAngeles 0 1 0 0 0 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1
New Orleans 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
NewYork 0 1 0 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 0 1
Oklahoma City 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
San Diego 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0
San Francisco 0 1 0 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 1 0 1
Seattle 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 0 0 1
Tampa 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1
Washington 0 1 0 0 0 1 1 1 1 1 1 0 1 0 1 0 0 1 1 1 0


Nodal accessibility can be derived directly from the binary connectivity matrix. This

would be the simplest form of measurement of accessibility of a node. By summing each

row of a matrix, the total value is the number of direct links between the given node and

other nodes. The higher the row total value, the more accessible the given node is to

other nodes in the network. Taaffe and Gauthier (1973) use an air transportation network

to demonstrate the accessibility of a node (p. 119). In their network, both New York and

Chicago have direct flights to each of the other cities in the network. This means that the

accessibility of New York and Chicago is greater than other cities in the network. The

long-haul fiber data acquired for this project will be used to demonstrate a similar

example. A random sample of twenty-one metropolitan areas is used.

Table 3-6. Network represented in matrix form
Nodes V, V, V, V4 V V6
V, 0 1 0 0 0 0
V, 1 0 0 0 1 0
V, 0 0 0 1 0 0
V4 0 0 1 0 1 0
V, 0 1 0 1 0 1
V, 0 0 0 0 1 0

The matrix gives information on the presence of links between each given city, for

example there is a direct link between Atlanta and Birmingham. There is a "1" at the

intersection of Atlanta and Birmingham to signify a link exists between the two cities.

The matrix also denotes the absence of direct connections as there is no direct link

between Boise and Albuquerque. A value of "0" appears at the intersection of Boise and

Albuquerque (Table 3-4). The matrix also tells us that New York is directly connected to

eleven other cities in the sample network.

Table 3-7 illustrates a hierarchy of the twenty-one cities, as shown in Table 3-4 in

matrix format, of long haul fiber direct links. Atlanta, Chicago, San Francisco, and

Washington, DC, are the most directly accessible nodes in the network measured. If we

added the number of direct links for the corresponding row of each of these cites, the

total is twelve. Based only on direct linkages, these cities rank at the top of the hierarchy

of nodes.

Table 3-7. Hierarchy of network cities
City Rank Number of links
Atlanta 1 12
Chicago 1 12
San Francisco 1 12
Washington 1 12
New York 5 11
Houston 6 10
Los Angeles 6 10
Kansas City 8 9
Cleveland 9 7
Seattle 9 7
Las Vegas 11 5
Cincinnati 12 4
New Orleans 12 4
San Diego 12 4
Tampa 12 4
Boise 16 3
Charlotte 16 3
Birmingham 18 2
Buffalo 18 2
Oklahoma City 18 2
Albuquerque 21 1

Much more can be derived from the matrix; however, it is important to note that this

measure does have limitations as it only reveals the presence or absence of links and

not their capacity to support flows. Though a node may have a high connectivity level

based on direct connections, it may be lower in the network hierarchy when indirect

connections are included in the measurement of accessibility.

Indirect connectivity measures can be counted by using matrix multiplication. This

is an element-by-element method that involves multiplying each row of a matrix by the

column of another matrix. The sum of the products of the element-by-element

multiplication is recorded in the corresponding cell of the new matrix. The first matrix is

multiplied by itself; the product of this method is then multiplied by the original matrix to

get the next product. The next product is then multiplied by the original matrix, and so on

and so forth.

Figure 3-6 illustrates the matrix multiplication of the matrix introduced in Table 3-4.

The matrix is first multiplied by itself (C-C) to produce C2. For each cell of C2 the value is

c": = c, .cir The two indirect links from node i to j are represented by
Cik k Ckj (Taaffe & Gauthier p. 122). The original graph is show in the top left corner

of the graphic. The resultant matrix, C2 is show in the bottom right corner of the graphic.

The presence or absence of two-link paths can be determined by the resultant matrix C2.

The two-link paths are represented by non-zero entries in the matrix. The presence of a

zero in the matrix indicates that neither a direct or indirect path of only two links exists.

Nodes V1 V2 V V, V, V

V 0 1 0 0 7 0
V, 1 0 0 0 1 0
v 000 1 0 0
v V 0 0 1 0
v VS 0 1- 0 1 0
V 0 0 0 1 0

SVi 1 0 0 0 (jU 0
1 0 0 0 1 0 V2 0 2 0 1 1 1
0 0 0 1 0 0 V, 0 0 1 0 1 0
0 0 1 0 1 0 V4 0 1 0 2 0 1
S 1 0 1 0 1 Vs 1 0 1 0 3 0
0 0 0 0 1 0 VA 0 1 0 1 0 1

Figure 3-6 Matrix Multiplication

The resultant matrix, C2, tells us which pairs of nodes have two-link paths

connecting them. For example, node 1 and 5 contain a two link path between them. If we

look at the original matrix (C), we find that there is no direct connection between these

two nodes. Node 3 and 4 contain a direct link between them, but not a two-link path.

Note that the most distant nodes in the network are 1 and 3.

If the new matrix (C2) is multiplied by the original matrix (C), the number of three

link paths will be identified in the product matrix. Figure 3-7shows the matrices used to

determine the three-link paths in the network, as well as the original network. The new

matrix introduced, C3, provides the connection by three-link paths between nodes. The

new matrix was produced by multiplying the rows of the original matrix (C) by the

columns of the second matrix (C2). More pairs of nodes in the network are connected by

third-link paths than are connected by either two-link paths or direct paths. By using this

same procedure of multiplication any number of link paths can be determined.

By adding the number of paths between nodes on the network, direct, two link, and

three link, the accessibility matrix or T, can be created. Figure 3-7 illustrates each matrix

added to create the accessibility matrix (T). The resulting accessibility matrix is shown in

Figure 3-7. By summing each row in the accessibility matrix, we are able to determine

the column vector, which is the accessibility of a given node in a network. Figure 3-7

illustrates the addition of rows and resultant vectors. The summation of elements:
SC, for k=1...d matrices. The summation of typical elements of c, of the matrices up

to the power d will yield the typical elements of the accessibility matrix (A) with typical

elements a, (see Figure 3-7). A diameter of four represents the number of links between

the two most distant nodes; 1 and 3 are separated by four links.

If the were then ranked based on the accessibility vectors, node 5 would be the

most accessible, as it has the highest vector. By summing the rows across the

accessibility matrix, row 5 produced the highest value. This means that node five is the

most connected and accessible node in the network. Nodes 4 and 2 have the second

highest value, followed by node 6. Finally, the least two accessible nodes in the network

would be 1 and 3; their row totals produced the lowest values.

Though it's possible to continue T.,uiC.inl.r. the matrices, the network cannot have

more than v-1 paths without establishing redundant paths. Multiplication is usually


V, 0 ",

Nodes VI V2 V3 V4- V5 V6
VI 0 1 0 0 0
V2 1 0 0 0 1 0
V3 0 0 0 1 0 0
V4 0 0 1 0 1 0
V'5 0 1 0 1 0 1
V6 0 0 0 0 1 0

Nodes VI V2 33 V4 V5 V6
VI 1 0 0 0 1 0
V2 0 2 0 1 0 1
V3 0 0 1 0 1 0
V4 0 1 0 2 0 1
VSN 0 1 0 1 0
V6 0 1 0 1 0 1

Nodes VI V2 V3 V4 V5 V6
VI 0 2 0 I 0 1
% 0 00 0
V2 2 0 1 0 4 0
V3 0 1 0 2 0 1
V4 I 0 2 0 4 0
V5 0 4 0 4 0 3
V6 1 0 0 3 0

Nodes VI V2 V3 V4 V5 V6
VI 2 0 1 0 4 0 Z = 7
V2 0 6. 5 0 4 Z = 15
V3 1 0 20 4 01 2 = 7
V4 0 5 0 0 4 2 = 15
V5 4 4 0 11 0 Z = 19
V6 0 4 0 4 0 3 = 11
Network Connectivity = 74

Accessibility Matrix (A)

Nodes V1 V2 V3 V4 V5 V6
VI 3 3 1 1 5 1 E = 14
V2 3 8 1 6 5 I = 28
V3 1 3 3 5 1 = 14
V4 1 6 3 8 5 5 E 28
V5 5 5 5 5 14 4 = 38
V6 1 5 I 5 4 4 = 20
Network Accessibility = 142

Figure 3-7. Three-linkage paths and the accessibility matrix
Note: Accessibility matrix (A) is a composite of C, C2, C3, and C4, based on the sum of
the typical elements of these matrices.

stopped when the network has reached its diameter, because each of the nodes in the

network is connected to each of the other nodes in the network. The matrix is powered

to it's diameter, in this case d=4, to ensure that the summation of for all matrices up to

that power will yield non-zero entries in the resultant matrix. The greatest number of links

between any two nodes is four. This does not mean that there will be no redundancies in

the network. Powering up to the diameter ensures that, all of the nodes are connected to

each other node in the network by some path, be it direct or indirect (via multiple links or


These techniques first appeared in a study on the Interstate Highway System in

the United States (Garrison 1960). Garrison's study is noted by Taaffe and Gauthier in

their 1973 publication the Geography of Transportation. Garrison, as well as Taaffe and

Gauthier raise concern over the redundancies that arise using matrix multiplication, as

they many not be any representation of distance minimizing behavior of networks. In

response to these concerns, a shortest-path matrix multiplication procedure was

discussed. This procedure was introduced by Alfonso Shimbel in 1953. Shimbel's

procedure computes accessibility in terms of the distance between nodes, thereby

eliminating redundancies. The procedure calculates the length of the shortest path

between nodes Taaffe and Gauthier (1973) suggest that for many real-world problems

redundancies are of no importance (p. 132). For this particular research however, the

importance of redundancy in long-haul fiber networks is pertinent. The majority of the

links in this network are redundant, which must be considered in the practical application

of link removals due to disturbance. These redundancies increase the insurance of a

network's functionality should it experience a disturbance. They allow also allow for

increased traffic and data transfer during normal conditions.

To review, the connectivity of a node is determined by summing across the rows

of the resultant matrix, and by summing the rows of the resultant matrix, the connectivity

of the entire network can be determined (see the, Matrix C4 Network Connectivity in

Figure 3-7). The measures of accessibility and connectivity give nearly identical

measures in nodal/link importance to a network (compare in Figure 3-7). Both

accessibility and connectivity measures have been demonstrated in this chapter.

However, the following analysis concentrates on the connectivity of nodes in order to

determine which nodes are best connected and most important to the network rather

than which nodes are most accessible.

Weighted or Valued Graphs and Shortest-Path

Networks can be represented as "valued graphs." This adds weights to the

network's links, highlighting their capacity or flow potential, or distance. If we take the

original graph and assign weights to the links we have a weighted network. Figure 3-8

illustrates the network as a weighted graph, with values assigned to each link. The new

weighted network is represented in a matrix with the new weighted values in Table 3-8.

The values added to the weighted network represent distance between the

corresponding pairs of nodes. In Table 3-8, the length (or value) of existing direct

connections between pairs of nodes is represented in the corresponding matrix cells.

For those pairs of nodes that do not have direct connections between them, a = is

recorded in the cell, representing infinity. Connection between a node and itself is

meaningless, so those cells have a zero recorded in the cell. These self-connection cells

always fall along the main diagonal of the matrix. We can tell from the matrix the

distance between nodes in direct connections, and also indirect connections. To travel

from node 1 to node 2 a distance of 20 is traveled, accomplished in a direct connection.

To travel from node 3 to 5 however, an indirect connection involving node 4 is required.

The distance from node 3 to 4 is 20, and the distance from node 4 to 5 is 10. The

total distance from node 3 to 5 is then 30.


20 10
V 1V0 v,

Figure 3-8. Weighted network

Table 3-8. Weighted network represented as a matrix
Nodes V, V; V3 V4 Vs V,
V, 0 20
V2 20 0 30
V, oo 0 20 >
V4, 20 0 10 m
V5 30 = 10 0 5
V, 5 5 0

In order to answer questions of accessibility questions about more complex

matrices, a procedure similar to matrix multiplication is required. The new procedure is

less complex than matrix multiplication. Element-by-element addition (x-y=x+y)

replaces element-by-element multiplication, and instead of summing the values the

minimum value is inserted in the appropriate cell of the new matrix [x + y = min(x y)]. The

value ij becomes the minimum value of the sums of these two-stage links from origin ito

k and then to destination j, or c,2 = c,- cA = min(c,, + c,). The previous equation for

summing the products in matrix multiplication was : C,' = c,L c, Figure 3-1

illustrates the element-by-element addition and determination of the minimum sum. It

also indicates where the new value would be inserted in the new matrix, L2. In the

example shown in Figure 3-9, the least-path linkage between nodes 1 and 5 are


The two-step path that connects node 1 and 5 has a length of 50. By using this

methodology, the successive powers of the weighted matrix are calculated. The results

indicate the minimum distance required between each pair of nodes. When a value of m

has been reached, a matrix of minimum distance has been achieved. Once the matrix

contains no non-zero entries, with the exception of the main diagonal, minimum distance

has been achieved between each and every pair of nodes in the network (Taaffe &

Gauthier, p. 141). This method has been compared to Shimbel's minimum connectivity

procedure, which includes powering the matrix until there are no zero connections left in

the matrix (Shimbel 1953, Taffe & Gauthier 1973). Taffe and Gauthier compared the

addition method with Shimbel's method and found that the structural relationships don't

change but the distance criteria might give a more refined measurement of nodal


2 i v.

2 0' 1'

Node. VI V, V, V, V, V Node V, Y, V, V4 V, V.
v; 200o jo v : 30 '4 0
V, 0 2 I 0 0 20 K
V a O YJ V: A 0O inu, 3Ua

v, 2 20 0 10 V 20 a 1l
V, .0 SO 10 0 "V 3' 10 0
V, 0 V. ,

(V1_ V,) +(V_ V+ =7 0 +" +.
(V,_ V.: + (Vi V 20 + 30 = 50
(Vi- V) +(V3_ V0) 0- m +
(V1_ V) + (V_ '4,: = + 10 = o
(Vi- V) +(V3_ V) + 0 =
(V_ V+,_ V) ) =+ V, + 5 = ,o

Nodes \V V, V, V, V V,
V -



Figure 3-9. Indirect Connections in a Weighted Network

Node and Link Removal

To determine the degree of importance of any node to a network, it can be

removed from the network (Haggett, Cliff, & Frey 1977b, p. 322). The remaining nodes in

the network are then recalculated to determine changes in the network; diameter (r), row

totals (R,), network connectivity; C R ,and disconnects. The matrix is repowered after

a node or pair of nodes has been removed. The results are compared to the original

network values. When a network has been powered to it's diameter (c,, 0), the row

sums of the resultant matrix indicate the connectivity of each node in the network.

By summing the row totals, the connectivity of the entire network can be derived.

The network connectivity index (NCI) is obtained by summing the powered matrix's row

totals; R, Nodes which cause the largest changes in network connectivity, to i and

j values, will exercise the greatest degree of importance within the network (Haggett,

Cliff, & Frey 1977b)

The method is demonstrated using the network shown in Figure 3-10. The network

is comprised of six nodes and seven links. The network was powered to a diameter of

three, and the resultant matrix is shown (Figure 3-10). The NCI of the complete network

is 34525000 (the sum of row totals). Figure 3-11 illustrates the impact of node removal

upon a network. Node V3 was removed from the network. This removal reduces the

network to five nodes and five links (Figure 3-11). This network was powered to a

diameter of three. The sum of rows of the resultant matrix was less than that of the

complete network, as expected. With the removal of Node V3, the connectivity of the

network dropped to 14625000. This is a 57.6% decrease in the NCI value. Figure 3-12

illustrates the removal of a link from a network. The link connecting node V4 and node

V5 was removed from the network. The remaining network was powered to a diameter

of three. The row totals of the resultant matrix were summed to determine the

connectivity of the network with absence of this link. The connectivity drops to

24205000, decreasing the connectivity of the network 30%. The NCI % change is

relatively high for the examples illustrated in Figures 3-11 and 3-12, however the

network contains a small number of both links and nodes. The node removal reduced

the number of nodes in the network 16.6% (from 6 nodes to 5),while the link removal

reduced the number of connections in the network 14% (from 7 links to 6).


The methods discussed in this chapter can be used to assess changes in

connectivity and accessibility, at node and network levels, in response to the removal of

a link or node. This chapter has discussed data methodology and analysis that are used

in the following chapters. The U.S. Internet backbone network is analyzed in Chapters 4

and 5. Node removal and matrix multiplication have been introduced in this chapter, and

they are they main tools used in the analysis for both Chapter 4 and 5. Network changes

will be measured for each removal scenario by comparing measurements in diameter,

row totals, NCI, and ranking, to the complete network. Chapter 4 describes the

unweighted analysis of the U.S. Internet backbone network. Single and pair node

removal scenarios performed to determine the effects upon the network. The analysis

performed in Chapter 5 also uses matrix multiplication and node and link removal

techniques, adding weights to the network. A better representation of the actual Internet

backbone network is achieved by using valued links.

V. 20

Noridp VI V2 V3 V4 V5 V6 \ 30
Vi : f n" w i :, )
V3 m ;o 00 ', o 20">
V D '.0) W, CO 2 10 'V v1

V6 o 0o ( co 20 5

Nodes 'VI V2 V3 V4 V5 V6 Row' otals
VVI 2023125 300000 1082500 140000 2182500 50000 Z 6078125
V2 300000 2'12500 1435000 2067500 440000 298750 L 253750
V3 1082500 1435000 1340000 950000 950000 182500 P = 5940000
V4 140000 2067500 950000 2108125 1060000 197500 Z 6823125
V5 2182500 440000 9500(X 1060000 2953125 40000 71562562
V6 50000 298750 182500 197500 40000 35625 = 804375
Network Connectivity Index (NCI) L 34525000

Figure 3-10 Example of a complete network

Nodes V1 V2 V4 V5 V6 \
VI 0 20 25 co co 25
V2 20 0 Mo 30 (o 30
V4 25 o 0 10 co
V5 co 30 10 0 5
V6 co co co 5 0 1 \
vi v6
Nodes V1 V2 V4 V5 V6
V1 1773125 0 0 1742500 0 L = 3515625
V2 0 2352500 1627500 0 238750 L = 4218750
V4 0 1627500 1168125 0 157500 E = 2953125
V5 1742500 0 0 1773125 0 E = 3515625
V6 0 238750 157500 0 25625 E = 421875
Network Connectivity Index (NCI) E = 146250000

Figure 3-11 Example of a Node removed from a network

VI V2 V3 V4 V5 V6

VI 0 2i co 25 co cc
V2 2fl I co co to
V3 cc Co C0 I-1 2 Co
V4 S co 20 to to
V5 co M :0 0o ( '
V6 0 tCo to to f 1'

25 v
20 5

VI V2 V3 V4 V5 V6

1660625 300000
300000 2322500
912500 1275000
240000 1162500
1410000 200000
50000 258750


240000 1410000 50000 E = 15'3125
1162500 200000 258750 X 5518750
300000 300000 172500 L = 4220000
1460625 940000 75000 = 41"8125
940000 2275625 0 = 5125625
75000 0 33125 589375

Network Connectivity Index (NCI) E = 24205000

Figure 3-12 Example of a link removed from a network




Protection of critical infrastructure in the United States has been a hot topic in

recent months. One of the most pressing issues is the insurance and protection of

critical infrastructure. The protection of infrastructures that directly effect the economy

and financial sector is needed. Major disturbances to the Internet's infrastructure have

occurred, which directly effects, among other things, the financial market. The fall of the

Twin Towers during the terrorist attacks of 9/11 and the northeastern power outages of

August 2003 have each led to Internet disruption due to physical failure.

It must be noted that disturbances to physical Internet infrastructure occur

frequently on small scales. These disturbances are usually accidental, often caused by

backhoes and shovels. These disturbances are less publicized, in part because the

disruption is minor and generally effects local service.

September 11'", 2001 was the largest loss of physical telecommunication

infrastructure (FCC 2001). Verizon alone lost their central office' along with 182,000

voice circuits, more than 1.6 million data circuits, and more than 11,000 lines serving

Internet service providers (GAO 2003). With the loss of Verizon's central office, 34,000

businesses lost their telecommunication service. Verizon is an example of just one

telecommunication provider that was dramatically effected by the terrorist attacks, many

other telecommunication providers lost physical infrastructure. Though the exchange

and clearing organizations were undamaged by the terrorist attacks, the economic

disruption was severe due to the loss of telecommunication infrastructure (GAO 2003).

'A central office is a facility, owned & operated by a telecommunication firm, which
houses the switching equipment that links customers to voice and data networks within
and outside the service area.

A power blackout on August 14, 2003, simultaneously affected Detroit, Cleveland.

Columbus and Long Island, New York as well as Canadian cities, Toronto, and Ottowa

(Rosenblum 2003, Semple 2003, NASA 2003). Figures 4-1 and 4-2, courtesy of Chris

Elvidge of the U.S. Air Force, have been made available by the NASA Earth Observatory

in their collection of unique images. Figures 4-1 and 4-2 illustrate the widespread power

outage. The top image was taken on August 14, 2003, roughly twenty hours before the

blackout occurred. The lower image was taken on August 15, 2003, about seven hours

after the blackout began: a post-September 11th reminder to the vulnerability of our

infrastructure and the interconnectedness of our cities. A series of events caused a

domino effect across interconnected power grids. The result was a widespread outage

(NASA 2003). The cities affected by the blackout experienced varying degrees of impact

upon themselves ar,d tirer rae..qrt.:.r. The same type of domino effect has occurred

within the nation's telecommunication infrastructure network. Telecommunication and

computer equipment is often dependent upon the power grid, though there are

exceptions of power generation to avoid the power grid dependency panrt:ularl, in data

and colocation centers [interconnection hubs for networks]).

Sixty Hudson Street, located in the financial district in Manhattan, is one of the

main hubs of Internet activity and interconnection in the world. Unlike other data and

interconnection facilities, this building is reliant upon the power grid and weathered

major disturbance during the power outages in August of 2003. With little time bought by

generators, many companies that housed equipment in the building were soon

negatively affected (Careless 2003). Figure 4-1 further illustrates the interdependencies

between Internet infrastructure and the power grid. Figure-4-3 shows a map of Internet

routing outages that occurred during the August 14, 2003 power outages. The

interdependency of networks increases their vulnerability to disturbances, as illustrated

by the terrorist attacks of 9-11, the power outages and other disturbances.

Figure 4-1. Image taken August 14, 2003, 9:29 p.m. EDT, about 20 hours before
blackout (NASA Earth Observatory, 2003)

Figure 4-2. Image taken August 15, 2003, 9:14 p.m. EDT, about 7 hours after blackout
(NASA Earth Observatory 2003)

Figure 4-3. Internet routing outages during the 2003 blackout. Source: www.renesy.com

This Chapter focuses on long-haul fiber, where, like the power grids across the

U.S., infinite overlaps and interconnection occur to create one large network. The

disturbance of each node holds the potential to disrupt the overall network. The

important a node is to the overall network, the more effect the disturbance will have upon

the connectivity of the network. A disruption in the network can also have major effect

on sectors reliant upon that network. Economic sectors are increasingly dependent upon

telecommunication infrastructure, particularly long-haul fiber. Should particular long-haul

fiber routes endure a major disturbance, the economy would be directly affected.

Detecting vulnerabilities is the beginning of protection.

This chapter introduces a methodology to identify the most critical links and nodes

in the Internet Backbone Network U.S. The analysis will proceed in two stages. First, an

unweighted analysis will consider each link to be of equal importance within the network.

In short, the unweighted analysis will not take into account the amount of bandwidth that

connects a node to other nodes. Weights will be added in Chapter 5, recognizing the

amount of bandwidth each link represents. In this chapter, a method for identifying a

hierarchy of nodes and links within an undirected, unweighted network is introduced.

The method applies an unweighted scenario to the Internet Backbone Network in the

United States using the graph theoretic concepts discussed in Chapter 3; this chapter

will assess the vulnerability of the long-haul fiber network in the U.S. on several different


* How node removal/disruption affects nodality (and ranking) of all other nodes and
changes the ranking of all remaining links in the network

S How node removal/disruption affects the overall connectivity of the network as
measured by the sum total of all nodality indices after removal/disruption;

* How link removal/disruption affects the nodality (and ranking) of nodes

* How link removal/disruption affects the overall connectivity of the network as
measured by the sum total of all nodality indices

S Changes in the rank and importance of remaining links.

A total of 218 nodes representing cities in the U.S. (including Alaska and Hawaii)

were used for the unweighted and weighted analysis (Figure 4-4). Table 4-1 shows a

complete list of the C/MSAs used in the analysis. Basic network measurements were

performed using the Internet Backbone Network data to determine network connectivity.

A symmetrical binary connectivity matrix is used to represent the network. The number

of binary links represented in the matrix (1042) were divided in half (521), to identify the

total number of unique links in the network (to avoid double counts). Figure 4-5

illustrates the network links used in both the unweighted and weighted analysis. This

partially addressed the redundancy that occurs with matrix representation of a network.

The self-to-self links that are represented in the principal diagonal will not be included in

the totals. As it is assumed that nodes are not connected to themselves. The gamma
and alpha i actual edges e e 2v-3 and
and alpha indices: y =- and
max edges e max 3(v- 2) 3(v-2)
actualcircuits e vl 2
maxcircuits 2v5 2
max circuits 2v 5 2v- 5

i 6t /


0 800 Miles

* Internet Backbone Node
SUnited States

Figure 4-4. Nodes of US. Internet backbone network
Note. Honolulu, HI & Anchorage, AK are not included in this map, but are Internet Backbone Nodes used in this analysis.

0 900 Miles

Figure 4-5. Links of U.S Internet backbone network

' Internet Backbone Link
SUnited States


Table 4-1. List of consolidated metropolitan statistical areas (CMSA) and metropolitan
statistical areas (MSA) in U.S. Internet backbone network

Albany-Schenectady-Troy Chattanooga
Albuquerque Cheyenne
Alexandria Chicago-Gary-Kenosha
Allentown-Bethlehem-Easton Chico-Paradise
Altoona Cincinnati-Hamilton
Amarillo Clarkrange
Anchorage Cleveland-Akron
Ardmore Colorado Springs
Atlanta Columbia
Austin-San Marcos Columbus
Bakersfield Cookeville
Baton Rouge Corpus Christi
Bellevue Dallas-Fort Worth
Billings Danville
Birdstown Daytona Beach
Birmingham Dayton-Springfield
Bismarck Denver-Boulder-Greeley
Blacksburg Des Moines
Bohemia Detroit-Ann Arbor-Flint
Boise Devils Lake
Boonshill Dibrell
Boston-Worcester-Lawrence Dickinson
Bowling Green Dunlap
Bridgeport Durand
Brownsville-Harlingen-San Benito Eau Claire
Bryan-College Station El Paso
Buffalo-Niagara Falls Elkhart-Goshen
Casper Emeryville
Cedar Knolls Erie
Celina Estill Springs
Chapel Hill Eugene-Springfield
Charleston-North Charleston Fargo-Moorhead
Charlotte Fayetteville
Charlotte-Gastonia-Rock Hill Flat Creek
Florence Kansas City
Fort Myers-Cape Coral Lafayette
Fosterville Lakeland-Winterhaven
Freehold Lancaster
Fresno Laredo
Gainesboro Las Vegas
Gainesville Laurinburg
Garden City Lexington-Fayette
Gardena Lincoln
Glenview Little Rock
Grand Forks Livingston
Grand Rapids-Muskegon-Hollan Longview-Marshall
Green Bay Los Angeles-Riverside-Orange County

Table 4-1. Continued

Iienrt.or,:..iJriOn n Salerr,.Higlh PFoin
Hamilton Square
Kalamazoo-Battle Creek
New York-Northern New Jersey-Long
Norfolk-Virginia Beach-Newport News
Oklahoma City
Philadelphia-Wilmington-Atlantic City
Providence-Fall River-Warwick
Raliegh-Durham-Chapel Hill
Redwood City

C 'S.
Market Place
Melbourne-Titusville-Palm Bay
Miami-Fort Lauderdale
Minneapolis-St. Paul
New Bern
New Brunswick
New London-Norwich
New Market
New Orleans
Saint Augustine

Salt Lake City-Ogden
San Antonio
San Diego
San Francisco-Oakland-San Jose
San Luis Obispo-Atascadero-Paso Robles
Santa Barbara-Santa Mria-Lompoc
Scranton-Wilkes Barre-Hazleton
Sherman Oaks
Shreveport-Bossier City
South Bend
South Holland
St. Louis

Table 4-1. Continued

Reno Topeka
Richmond-Petersburg Tracy City
Roachdale Tulsa
Roanoke Tuscan
Roanoke Rapids Victoria
Rochelle Park Waco
Rochester Wartburg
Rocky Mount Washington-Baltimore
Rolling Meadows Waterford
Sacramento-Yolo Wayne
West Haven
West Palm Beach-Boca Raton

The gamma and alpha indices were computed as standard measures of network

connectivity. The long-haul fiber network contains 218 nodes or vertices and 521 links or

edges. The gamma index is simply the ratio of the number of nodes in a network to the
e 521
maximum number possible in that network: y 3(v- 2) 648 Hence, the gamma index for

the Internet backbone network in the U.S. is 0.804. In terms of maximal connectivity, this

means that the network is 80.4% connected.

The alpha index, is a ratio measure of the number of actual links to the maximum

number possible in the network: actualcircuits e-v+1 521-218+1 304 .705
max circuits 2v-5 2(218)-5 431
Like the gamma index, the alpha index ranges from 0-1. A zero value would represent a

network that is minimally connected, maximally connected networks would be

represented by 1. The network linkage, or circuitry, is 70.5 % of the maximum.

Table 4-2 shows binary data for the Internet backbone network in the U.S. for

2003. The binary totals represent the number of nodes to which a node is directly linked.

Chicago-Gary-Kenosha has 36 binary links, more than any other metropolitan area. This

means that Chicago was directly connected to 36 other cities within the U.S. Internet

backbone network. New York and Dallas followed closely with 33 and 30 links

respectively. The 2003 data contains data for 218 metropolitan areas in the U.S. As

shown in Table 4-2 Washington-Baltimore has 28 separate long-haul links to

metropolitan areas, ranking fourth. Atlanta and San Francisco-Oakland-San Jose have

27 and 26 binary links, respectively, filling the fifth and sixth ranks. These cities have the

most binary connections to other major metropolitan areas in the U.S.

Table 4-2. Long-haul fiber optic binary connections in U.S. metropolitan areas
Metropolitan Area (C/MSA) Binary Connections 2003 (Rank)
Chicago-Gary-Kenosha 36 (1)
New York-Northern New Jersey-Long Island 33 (2)
Dallas-Fort Worth 30 (3)
Washington-Baltimore 28 (4)
Atlanta 27(5)
San Francisco-Oakland-San Jose 26 (6)
Denver-Boulder-Greeley 19 (7)
Los Angeles-Riverside-Orange County 19 (7)
Cleveland-Akron 18(9)
Kansas City 18(9)
Sacramento-Yolo 16(11)
Houston-Galveston-Brazoria 15 (12)
Miami-Fort Lauderdale 14 (13)
St. Louis 14 (13)
Boston-Worcester-Lawrence 13(15)
Nashville 13(15)
Seattle-Tacoma-Bremerton 13(15)
Tampa-St.Petersburg-Clearwater 13 (15)

Matrix multiplication was the main tool used in the unweighted analysis. As

explained and demonstrated in Chapter 3, matrix multiplication can help to determine

nodal accessibility as well as to establish a ranking of nodes and links within a network.

The connectivity matrix is a model of the network, allowing various scenarios to be

simulated in order to learn more about the network. By using matrix multiplication, the

most critical nodes to the overall network were determined. Various link-removal

scenarios were carried out using matrix multiplication. Both single-node and pairs-of-

nodes removal scenarios were performed. Once the city ranking for the complete

network was established, the nodes to be used in the single-and-paired-node-removal

scenarios were identified. Each of the nodes that were ranked in the top 12 in terms of

connectivity for the original, fully connected network were used in the single-node

removal scenarios. Every possible pairing of the top 12 nodes was also removed from

the network, accounting for 72 double-node removal scenarios. The new network

measurements show the degree of a node's importance to the network, and the effect of

different removal scenarios upon the entire network: diameter, disconnects, the

Unweighted Relative Connectivity Index (URCI), the Network Connectivity Index (NCI)

and the percentage change in connectivity.

For each removal scenario, the binary connectivity matrix was multiplied to its

diameter. This means multiplying the entire matrix until no zeros exist within the matrix.

In matrix multiplication, a zero represents a disconnect. The absence of zeros illustrates

each node in the network is connected to each of the other nodes in the network. When

a network reaches its diameter, all of the nodes in the network are connected. The

complete matrix was multiplied to its diameter, 11. This means the network was

multiplied 10 times before each node in the network was connected to each of the other

nodes in the network through some path.

Various removal scenarios completely disconnected particular nodes from the

entire network. These are called "disconnects." A disconnected node is severed from the

matrix and connection will not occur by continued powering of the matrix. This was

considered when determining the presence or absence of zero's within the connected

nodes of the network.

After the matrix was powered to its diameter, the rows from the product of the last

powering were totaled. These totals were used to determine which 25 nodes were at the

top of the hierarchy (Table 4-3). Figure 4-6 illustrates the hierarchy of the 218 nodes in

the U.S. based on the unweighted relative connectivity index. An index was then created

from the row totals. Each row total was divided by the minimum row total of the top 25

nodes in the network. The minimum observation was Salt Lake City, with a row total of

538479367663. This index is an unweighted relative connectivity index (URCI). The

value for Salt Lake City-Ogden was reassigned to 1.0 to begin the index.

URCI = i =1..N, where N=25. For example, to calculate the URCI values for Atlanta
(row totals= 1544357876452) within a fully connected network,

URCI, = 1544357876452 2 868. Figure 4-7 shows the geographical distribution of
xm, 538479367663
the top 25 nodes based on the URCI.

The network connectivity index (NCI) was obtained by summing the row totals for

the entire network. The NCI was created after the row totals for each multiplied matrix

scenario had been converted to the URCI. The converted values were summed to give a

value representing the overall connectivity for the entire network for each removal

scenario. The simple NCI model is: Xota, = X, where i = 1...218. The NCI of the fully

connected network is 99.499. This means that the summed row totals, after being

converted using the URCI, had a value of 99.499.

The percentage change value is used to illustrate the amount of change in network

connectivity from each network scenario compared to the original, fully connected

network. The change was calculated by subtracting the NCI (of a given scenario) from

-re .:.r;nral connectivity value. The result was then divided by the original value, 99.499.

The equation is % =(x, x2) For example, to calculate the percentage change in
the connectivity of the network with Chicago-Gary-Kenosha (connectivity 36.909)

removed in comparison to the original network the equation is

(99.499- 36.909) 0.692 The final result gives the percentage of change in
network connectivity compared to the fully connected network.

Node URCI N *
L 0-0.116
: 0.116-0.292
0.292- 0.505
0.505- 0.789
1.339- 2.241
r United States o soo Miles

Figure 4-6. Cities of U.S. Internet backbone network based on URCI, using graduated symbols to represent connectivity importance.

D -
i., ", T_ -

o 1 1.494 .
S1.494 -1.988
S1.988 2.482

S2.482 2.976
/ -i ^-i

5 2.976 3.47
Q 2.482-2.976 6

El 2.976-3.47
United States 5soo Mile

Figure 4-7. Top 25 nodes in U.S. Internet backbone network, using graduated symbols based on URCI bullets.

Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E9YI70BPF_CV36C0 INGEST_TIME 2014-04-18T22:30:15Z PACKAGE AA00014267_00001