A Distributed Approach to Multicast Session Discovery, mDNS - a Globally Scalable Multicast Session Directory Architecture

Material Information

A Distributed Approach to Multicast Session Discovery, mDNS - a Globally Scalable Multicast Session Directory Architecture
Harsh, Piyush
Place of Publication:
[Gainesville, Fla.]
University of Florida
Publication Date:
Physical Description:
1 online resource (134 p.)

Thesis/Dissertation Information

Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering
Computer and Information Science and Engineering
Committee Chair:
Newman, Richard E.
Committee Members:
Chow, Yuan-Chieh R.
Chen, Shigang
Banerjee, Arunava
Shea, John M.
Graduation Date:


Subjects / Keywords:
Bandwidth ( jstor )
Database design ( jstor )
Domain name system ( jstor )
End user searching ( jstor )
End users ( jstor )
Internet ( jstor )
Keyword searching ( jstor )
Keywords ( jstor )
Simulations ( jstor )
Standard deviation ( jstor )
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
architecture, dht, directory, distributed, geocoding, geotagging, internet, mdns, multicast, networks, overlay, session
Electronic Thesis or Dissertation
bibliography ( marcgt )
theses ( marcgt )
Computer Engineering thesis, Ph.D.


This dissertation addresses the issue of multicast session discovery by an end user. IP Multicast has tremendous network bandwidth utilization benefits over conventional data transmission strategies. Use of multicast could prove cost effective for many Content Distribution Networks (CDN). From an end user perspective, accessing a live stream using multicast will result in better video reception quality compared to the unicast transmission. This being imposed largely due to limited line bandwidth being shared among several competing data streams. Still the deployment is very sparse in the Internet. One of the reasons is less user demand due to lower usability compared to IP unicast. The supporting network infrastructure that was deployed after standardization of TCP protocol helped tremendously in improving the usability of IP unicast. The Domain Name Service (DNS) infrastructure allowed users to access target hosts using a Fully Qualified Domain Name (FQDN) string against using the dotted decimal IP addresses. Since the unicast IP addresses were allotted in a regulated manner and because of the longevity of assignments, it became easier to search and locate resources on the Internet. Lack of such infrastructure support has deprived multicast its usability from an end user perspective. More importantly, shared nature of multicast addresses and the short life of address use and frequent reuse from the common pool makes it difficult to search and discover content by the end user. This dissertation provides a distributed hierarchical architecture that efficiently addresses some of the usability issues raised above. The tree hierarchy closely co-located with the DNS infrastructure allows the presented scheme to assign Universal Resource Identifiers (URIs) for multicast streams that an end user can bookmark. The proposed scheme automatically re-maps the correct session parameters with the URIs in case they change in future. The Distributed Hash Table (DHT) approach for search and discovery of multicast sessions presented in this dissertation uses a tree hierarchy which is more suitable for the task at hand. Many live multicast streams are not replicated, so there is a need to locate the source of the data and therefore the search scheme required is somewhat traditional in nature. The relative instability of many multicast streams and associated session parameters makes many traditional P2P DHT schemes unsuitable for the problems addressed in this work. Simulation results and analytical comparison of the proposed scheme with existing approaches are presented towards the end of this dissertation. A detailed discussion of why several of the existing DHT schemes for keyword search and Session Announcement Protocol (SAP) / Session Discovery Protocol (SDP) based multicast session discovery schemes are unsuitable for the identified problem is presented as well. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis (Ph.D.)--University of Florida, 2010.
Adviser: Newman, Richard E.
Statement of Responsibility:
by Piyush Harsh.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Applicable rights reserved.
Embargo Date:
Resource Identifier:
004979548 ( ALEPH )
709592957 ( OCLC )
LD1780 2010 ( lcc )


This item has the following downloads:

Full Text

AverageRouteStablization Time in Seconds (X Beta. Y.Apha, Z Time)
"set3datatxt" u 12 9

- 700

Figure 5-26. Average route stabilization

time scenario 3



Average Latency

Mi Latency

Max Latency

RouteStabization Time Standard Deviaton in Seconds (X Beta. Y Alpha, Z STDEV)
"setdata txt" 1 2 10

Z 50

Figure 5-27. Route stabilization time

standard deviation scenario 3

S# Dom = 5

* #Dom=4

*# Dom = 3

0 # Dom = 2

*#Dom= 1

1000 2000 3000 4000

Figure 5-28. Summary chart for latency experiments

Figures 5-30 and 5-31 shows the median and the average latency values in milliseconds.

The x-axis shows the number of domains.

The significant jump in discovery latency time from one domain to higher is due

to the 'MSDPROBE' & 'REDIRECT' protocol steps involved in the domain external

searches as compared to domain local search which is the case with simulating

with just one domain. In the experiments we performed, session registration was

performed at randomly chosen domain and search initiation was done immediately after

[59] S. Q. Zhuang, B. Y. Zhao, A. D. Joseph, R. H. Katz, and J. D. Kubiatowicz, "Bayeux:
an architecture for scalable and fault-tolerant wide-area data dissemination," in
NOSSDAV '01: Proceedings of the 11th international workshop on Network and
operating systems support for digital audio and video. New York, NY, USA: ACM,
2001, pp. 11-20.

[60] A. Rowstron and P. Druschel, "Storage management and caching in PAST, a
large-scale, persistent peer-to-peer storage utility," SIGOPS Oper. Syst. Rev.,
vol. 35, no. 5, pp. 188-201, 2001.

[61] A. I. T. Rowstron, A.-M. Kermarrec, M. Castro, and P. Druschel, "SCRIBE: The
design of a large-scale event notification infrastructure," in NGC '01: Proceedings of
the Third International COST264 Workshop on Networked Group Communication.
London, UK: Springer-Verlag, 2001, pp. 30-43.

[62] M. Castro, P. Druschel, A.-M. Kermarrec, and A. Rowstron, "One ring to rule them
all: service discovery and binding in structured peer-to-peer overlay networks," in
EW 10: Proceedings of the 10th workshop on ACM SIGOPS European workshop.
New York, NY, USA: ACM, 2002, pp. 140-145.

[63] R. Droms, "Automated configuration of TCP/IP with DHCP," Internet Computing,
IEEE, vol. 3, no. 4, pp. 45-53, Jul/Aug 1999.

[64] "Dynamic host configuration protocol," RFC 2131 (Draft Standard), Mar 1997,
updated by RFCs 3396, 4361. [Online]. Available:
[Accessed: July 20, 2010]

[65] R. Karedla, J. S. Love, and B. G. Wherry, "Caching strategies to improve disk system
performance," Computer, vol. 27, no. 3, pp. 38-46, 1994.

[66] E. J. O'Neil, P. E. O'Neil, and G. Weikum, "The LRU-K page replacement algorithm
for database disk buffering," in SIGMOD '93: Proceedings of the 1993 ACM
SIGMOD international conference on Management of data. New York, NY, USA:
ACM, 1993, pp. 297-306.

[67] "The Network Simulator ns-2." [Online]. Available:
[Accessed: July 21, 2010]

[68] P. Harsh, "mDNS simulation data access website." [Online]. Available: [Accessed: July 21, 2010]

[69] Y.-h. Chu, S. G. Rao, and H. Zhang, "A case for end system multicast (keynote
address)," in SIGMETRICS '00: Proceedings of the 2000 ACM SIGMETRICS
international conference on Measurement and modeling of computer systems.
New York, NY, USA: ACM, 2000, pp. 1-12.


addresses and the short life of address use and frequent reuse from the common pool

makes it difficult to search and discover content by the end user.

This dissertation provides a distributed hierarchical architecture that efficiently

addresses some of the usability issues raised above. The tree hierarchy closely

co-located with the DNS infrastructure allows the presented scheme to assign Universal

Resource Identifiers (URIs) for multicast streams that an end user can bookmark.

The proposed scheme automatically re-maps the correct session parameters with the

URIs in case they change in future. The Distributed Hash Table (DHT) approach for

search and discovery of multicast sessions presented in this dissertation uses a tree

hierarchy which is more suitable for the task at hand. Many live multicast streams are

not replicated, so there is a need to locate the source of the data and therefore the

search scheme required is somewhat traditional in nature. The relative instability of

many multicast streams and associated session parameters makes many traditional

P2P DHT schemes unsuitable for the problems addressed in this work.

Simulation results and analytical comparison of the proposed scheme with

existing approaches are presented towards the end of this dissertation. A detailed

discussion of why several of the existing DHT schemes for keyword search and Session

Announcement Protocol (SAP) / Session Discovery Protocol (SDP) based multicast

session discovery schemes are unsuitable for the identified problem is presented as


at that node. Once the 'mDNS' structure stabilizes, the use of redirect and indirection

caches at every MSD should tremendously reduce routing burden at nodes closer to the

root in the DHT hierarchy.

The area of reliability in the face of domain failures may be improved a bit. Although

storing a shadow session record at a different location helps alleviate the issue a bit but

a more robust domain failure safeguard algorithm could be designed. The authentication

and security aspect of inter-domain communication especially validation and verification

of control messages has to be worked on. Although because of transition to SSM from

ASM mode some of the more flagrant security issues in IP multicast, such as spurious

cross traffic and an unhindered sender policy where the sender need not be part of the

multicast group where it sends data on, should take care of itself.

4.8 Conclusion

This chapter describes the integration of the DHT hierarchy and the URS based

URL scheme along with the larger picture of 'mDNS' architecture. It describes how

'mDNS' is capable of coexisting in both ASM and SSM multicast environments. Various

failure scenarios were visited and analyzed in some details. And towards the end, this

chapter revisited the designed goals listed in Chapter 2 and argued how the overall

architecture achieve those goals. We briefly describe some areas of improvements in

the proposed architecture.

I dedicate this to my parents who always supported my decisions and all my teachers
who made me the person I am today.

@ dom00

Scenario 3 domain 10

@ domOl

Scenario 3 domain 1

@ dom02


associated IP address as a factor that allows DNS entries to be cached. With
inherent instability in the multicast addresses, DNS in its current format can not be
2. lack of a standard content discovery mechanism: as multicast contents have
transient life cycles with varied duration and availability, it becomes almost
impractical for modern search engines to crawl multicast content space and
maintain a crawler data of contents whose availability is at best uncertain. There
is a lack of a standard service that would allow end users to locate multicast

Traditionally users get information about the time and duration of popular multicast

groups through usenet groups and through emails from friends. Clearly these discovery

mechanisms are not scalable if multicast has to become a user driven technology. There

has to be a standard and scalable service that would allow end users to locate existing

multicast session in almost real time. Improving usability will ensure next wave of user

acceptance for multicast technology. Let us examine now why ISPs have been reluctant

to deploy multicast! ISPs perspective: network complexity

Compared to unicast where the core routers have to perform routing table updates

periodically and do packet forwarding, a multicast router has to execute numerous

protocols. We have already seen in what all goes to make multicast work in

a modern network. Supporting ASM mode of operation is especially complex as the

responsibility of source discovery rests with the routers (configured as RPs). The 2

step PIM-SM protocol where initially hosts get data via shared distribution trees rooted

at RPs and later switch to shortest path tree (SPT) adds complexity. With addition of

source filtering in IGMP v3 [31] and MLD v2 [29], the receiver gains the capability to

specifically denote a set of sources it is interested in getting the multicast data from.

Network researchers agree that implementing a single source multicast with strict

unidirectional data flow from the source to interested hosts is much easier to implement.

If further, source discovery is made a user prerogative, the network will be released from

the added burden to run MSDP protocol and maintaining a list of active senders. Further

among them are OceanStore [58] which is a wide-area persistent distributed storage

system meant to scale the globe and Bayeux [59], an application-level multicast protocol Pastry

Pastry [37] is an application layer overlay developed in collaboration with Rice

University and Microsoft Research. Each node in pastry is assigned a unique 128 bits ID

that indicates its position in the circular ID space. Every pastry node maintains a routing

table, neighborhood set and a leaf set that helps the overlay to deal with intermittent

node failures. Neighborhood set contains predefined number of nodes that are closest to

the given nodes based on some set proximity criteria. Whereas a leaf set contain nodes

whose nodelDs are closest to the current node's ID. Neighborhood set is not used in

routing but used to guarantee locality bounds in routing. The routing table has [Iog2bN]

entries with 2b 1 entries in each row. 'b' is a configuration parameter typically set to 4

by the authors. The routing scheme is prefix based and is very similar to one adopted by

Tapestry [49] [56]. Several successful applications have been developed that use pastry

as their routing base. Notable among them are PAST [60] and SCRIBE [61]. A global

bootstrapping service [62] for any application layer overlay has also been proposed that

uses pastry as its routing base. Kademlia

Kademlia [38] has several attractive features compared to other application

overlays. It minimizes the number of configuration messages that nodes must exchange

in order to find out about each other. It uses XOR of nodelDs as a measure of distance

between two nodes. Because of symmetric nature of XOR, nodes participating in

Kademlia overlay learn useful routing information from the keyword queries received.

Other DHTs lack this ability. Additionally, Kademlia uses a unified routing algorithm

from beginning till end regardless of proximity of intermediate nodes to the target node.

This simplifies the routing algorithm quite significantly. Nodes are treated as leaves in a

binary tree where each node's position in the tree is determined by the shortest unique


Scenario 1 domain 8


Scenario 1 domain 9

The 'Virtual DNS' settings for domain hierarchy scenario 2 are presented next. The

domain numbers are as shown in the hierarchy figure shown earlier in Chapter 5.

@ dom00
@ domOl
@ dom02
0 dom000


Root DNS server

Local DNS server TLD DNS server NSI.UFL.EDU

Author ittv
S. gators
mcast.ufl.edulgators -R-

Iticast stream

Figure 3-2. Typical steps in 'mDNS' URI name resolution

3.3.3 Additional Usage
As mentioned earlier an URS is also used as a bootstrapping device. The system
administrator configures a few parameters while setting up the URS. These parameters
include -
* PMCAST The parent's multicast communication channel
* CMCAST :The children's multicast communication channel
* supported IGMP version
* 'mDNS' parent domain's URL string
MSD servers use these parameters to set up necessary communication channels in
order to join the 'mDNS' service hierarchy. Parent domain's URL string is needed in

As immediate session search visibility is a major design goal, the system is not a crawler

based architecture like traditional search engines. Since the multicast space is not

well organized, use of crawlers is not even feasible. The system is a registration based

design where the content host (source) registers the session only with its domain local

system component. The system makes the session visible globally. The design details

and architecture components are described next.

2.2 Distributed Hash Table

Secure hashing [34] is a reasonable tool that achieves equitable distribution of data

over multiple sites. In a Distributed Hash Table (DHT) scheme, the record's keyword is

hashed. The hash value determines where the actual record will be stored for retrieval

later. In "mDNS" search architecture, each data site or Multicast Session Directory

(MSD) manages two types of records. These are called -

* Domain local session database records
* Global session database records

Each site or MSD maintains three databases -

* Locally Scoped Sessions Database
* Globally Scoped Sessions Database
* Geographical Database

These data sites are linked to one another in a tree hierarchy with a single root node.

The DHT hash space is divided among all the participants in the tree overlay. The

algorithms managing the hash space distribution and redistribution in the face of

topology changes are discussed later in this chapter.

2.2.1 Records Structure

Figure 2-1 shows the components that make up an administratively scoped

multicast session (local session) and globally scoped multicast session records. Some

of the data elements' importance will be revealed in next two chapters.

A brief explanation of the various fields follows next -

Average of Weighted Scores (X Beta, Y Alpha. Z Score) Standard Deviation of Weghted Scores (X Beta, Y Alpha, Z STDEV)
"set3datatxt" u 1.2:11 "set3datatxt" u 12:12

105 + 04
10 0.35
95 03
Z B5 z 025

55' 0 -
2 T 2

Figure 5-36. Average of weighted scores Figure 5-37. Standard deviation of weighted
scenario 3 scores- scenario 3

Figure 5-32 shows the average weighted scaled scores for simulation scenario 1

for three experimental runs. The figure has been drawn using weights of 0.5 for routing

table switches, 0.3 for route stabilization time, and 0.2 for hash-skew value. Figure 5-34

shows the average weighted scaled scores for scenario 2 and Figure 5-36 shows the

averages of weighted scaled scores for simulation scenario 3. The x-axis represents

/ values from 0.1 to 2.0, y-axis represents a ranging from 0.1 to /, and the z-axis

represents the scaled weighted score.

Looking into the weighted score plot, one can see that for scenario 1 simulation

setup, the best performance is achieved if a, P3 [1.8 2.0] with a < /. For scenario 2,

the optimal system performance is achieved at a e [0.4-1.0] and / e [1.8-2.0] to report

some of the values. For scenario 3 the system performed better with a, / e [1.2 2.0]

with a < 3.

Considering the simulation results, it is clear that the choice of a and / depends

on the network topology. A system administrator is free to choose a value of his liking

although it is advisable to follow the common selection guidelines for the full hierarchy. In

order to maintain global routing table stability, a relatively high value of / is suggested,

and for routing table stability at the subtree level, a higher value of a is advised.



ACKNOWLEDGMENTS ..........................

LIST O F TABLES .. .. .. .. .. .. .. .

LIST O F FIG URES .. .. .. .. .. .. .. .

ABSTRACT ................... ...........


1 GENERAL INTRODUCTION .. ...............

1.1 IP M ulticast . .
1.1.1 W hy M ulticast? . .
1.1.2 Requirements for Enabling/Using Multicast .... Multicast addressing ........... Multicast routing ............. IGMP/MLD: Internet group management Users perspective: low usability ... ISPs perspective: network complexity .
1.2 What This Dissertation Tries to Solve? ..........
1.3 C conclusion . .


2.1 Design Goals ........... ... .........
2.2 Distributed Hash Table ............... ...
2.2.1 Records Structure .................
2.2.2 DHT Hierarchy Construction .
2.2.3 DHT Operations ................. Addition of a domain .......... . Removal of a domain ......... Addition of session record . Deletion of a session record .
2.2.4 DHT Stability . .
2.3 Supporting Multicast Session Discovery .
2.3.1 Database Design .................. Global records database . Local records database . Geo-tagged database .
2.3.2 Associated Algorithms .............. Session registration .... .. Session search ....

. 28

. 28
. 29
. 29
. 3 1
. 34
. 34
. 35
. 36
. 37
. 38
. 39
. 39
. 39
. 4 1
. 4 1
. 44
. 44
. 4 6 Recovering from parent node failures .


. 4


. .


C -I

'~~2) '






Figure 2-11. Parent node failure recovery strategy


I ,.


SSM is significantly different than ASM as only the source node at which the distribution

tree is rooted, is allowed to transmit data along the tree to the interested recipient hosts

and thus the name "Single Source Multicast".

1.1.1 Why Multicast?

IP multicast offers tremendous bandwidth benefits to the source as well as better

quality of service (QoS) perception to the end users. In multicast, the core network

does data stream replication along the branches in the distribution tree. Along any path

from the root to the leaf node, there exists just one data stream. Compare this strategy

using IP multicast to the data distribution using IP unicast where data stream replication

must be done at the source itself. The bandwidth requirement at the source in unicast

increases linearly with the number of subscribers interested in receiving the data stream.



Figure 1-1. Data transmission in unicast v multicast

In Figure 1-1, the source node has to replicate the data stream 4 times to support

4 recipient hosts. There is higher bandwidth load on intermediate sections of the core

network as well. Comparing this with the case where the source node is transmitting

data using multicast, the sender just provides one data stream and the core network

[70] S. E. Deering, "Multicast routing in internetworks and extended LANs," in SIGCOMM
'88: Symposium proceedings on Communications architectures and protocols.
New York, NY, USA: ACM, 1988, pp. 55-64.

[71] G. Camarillo and M. A. Garcia-Martin, The 3G IP Multimedia Subsystem (IMS):
Merging the Internet and the Cellular Worlds. John Wiley & Sons, 2006.

[72] R. Kalden, I. Meirick, and M. Meyer, "Wireless internet access based on GPRS,"
2000. [Online]. Available: 10.1.1.
11.7851 [Accessed: July 20, 2010]


@ dom000
@ dom01

Scenario 1 -domain 1


Scenario 1 domain 4

@ dom020

Table 1-1. IANA assigned multicast addresses (few examples)
Address (IPv4) Address (IPv6) Usage Scope FF02:0:0:0:0:0:0:1 All Node Addresses Link Local FF02:0:0:0:0:0:0:2 All Multicast Routers Link Local
FF01:0:0:0:0:0:0:1 All Node Addresses Node Local
FF05:0:0:0:0:0:1:3 All DHCP Servers Site Local
FFOX:0:0:0:0:0:0:130 UPnP All Scopes

Keeping all these restrictions in view, Internet Assigned Number Authority (IANA)

adopted a somewhat relaxed attitude towards multicast addresses.

IANA assigned the old class D address space for multicast group addressing. All

addresses in this range have 1110 prefix as the first 4 bits of IPv4 address. Therefore,

IP multicast addresses range from Multicast addresses in

IPv6 are identified by the first octet of the address set as OxFF In the earlier days, the

multicast data packet's scope was determined by Time to Live (TTL) scoping rules. Over

the period, TTL scoping was found to be confusing to implement and manage in the

deployed networks. As IP multicast gained some traction, IANA started to manage the

address space more efficiently. Table 1-1 shows some of the addresses that have been

assigned by the IANA and their intended purpose and valid scopes.

In IPv4 to has been reserved as Administratively

Scoped [8] multicast addresses. Data transmitted on these groups are not allowed to

cross the administrative domain boundaries. For IPv6 this range is defined as FFx4::/16.

FFxE::/16 is defined as global scope, i.e. data packets addressed to this address range

are eligible to be routed over the public internet. Figure 1-4 shows the general format of

IP multicast addresses in IPv6.

8 4 4 112
11111111 flags scope Group identification

Figure 1-4. Multicast address format in IPv6

For session discovery latency experiments, we used the following list of keywords

for session registration. Each keyword generated one registration request, and the

same keyword was used for session search which immediately followed the registration


gator, hindi, rediff, football, soccer, movies, audio, songs, picture, piyush, amrita,
table, dinner, restaurant, match, base, tyre, car, couch, potato, refrigerator, shelf,
motorcycle, sweater, shirt, dress, purse, mobile, watch, clock, top, jacket, coat, idol,
deity, kitchen, market, mall, road, footpath, spectacle, television, knife, board, onion,
jalapeno, beer, time, mouse, telefone, pen, cover, case, copy, book, pencil, light, bulb,
fan, tape, suitcase, paper, garland, garden, flower, carpet, tie, necklace, lens, camera,
battery, cake, icing, sugar, milk, egg, water, envelope, drawer, cheque, belt, shoe,
slipper, scanner, cards, rocket, shuttle, tennis, ball, legs, hands, fingers, nail, toe,
hammer, srew, plier, match-stick, gun, fun, park, swing, slope, ranch, grass, bike, helmet,
gear, gloves, batter, pillow, quilt, tissue, mop, broom, cargo, sweet, perfume, frangrance,
meat, butter, salt, tea, coffee, ground, boil, receipt, plastic, floor, wire, number, frown,
torch, rope, tent, camp, row, boat, tide, river, stream, ocean, mountain, mushroom,
fungi, algae, ferns, leaf, bud, eggplant, cucumber, radish, mustard, honey, oil, pan,
spatula, mixer, dough, juice, cook, cookie, spice, walnut, cinnamon, eat, jump, hop,
run, play, alligator, turtle, fish, snake, slime, moss, bullet, cannon, lamp, medicine,
vitamin, cholera, disease, hospital, doctor, nurse, patient, foot, malaria, scalp, ear, throat,
drink, force, hair, long, dictionary, speaker, album, mirror, lip-stick, petroleum, gasoline,
flourine, asbestos, arsenic, mild, wild, animal, deep, blue, whale, dolphin, puppy, birds,
aquarium, radium, mars, planet, solar, sun, rays, ozone, atmosphere, aeroplane, flight,
orange, pretzel, dance, salsa, latino, pepper, good, sauce, scream, shout, yell, radio,
next, rock, guitar, saxophone, castle, stairs, porch, patio, change, pool, fry, saute, grind,
burn, churn, turn, garbage, dust-bin, bun, noodles, rice, ring, police, jeep, truck, bus,
children, school, nursery, animation, alien, combat, challenge, whip, leash, cream, pie,
hat, bat, door, kid, prank, switch, blanket, death, fear, insect, net, mosquito, robot, laser,
robot, hello, greet, smile, grin, strap, breeze, wind, air, gale, hurricane, storm, rain,
current, ship, yatch, enough

Data was collected using up to 5 domains connected according to scenario 3 hierarchy.

The 'Virtual DNS', MSD, and URS server parameters were setup according to the

configuration details for domains 10, 1, 4, 5, and 2 provided earlier.


The main database is a three level structure. The key to be searched is first hashed

to find out the location in the hash table. Then the overflow linked list of target keys is

traversed to located the desired key. The third level is the linked list of all the session

records associated with the requested search keyword. Actual session record structure

for globally scoped multicast session is shown earlier in Figure 2-1. Local records database

The local records database construction is very similar to the global records

database. The difference is that it stores all the session records whose registration

request originates from within its own domain. The sessions are stored irrespective of

whether they are administratively scoped or globally scoped. Geo-tagged database

Each multicast session record is geo-tagged based on the location data that the

session creator provides during session registration phase. The location data normally

would be the location where the content is originating. But in some cases it can also

depend on the nature of the content itself. The inclusion of geographical information

allows end users to fine tune multicast search by using proximity as a search criteria in

addition to the keyword search parameters.

Figure 2-7 shows the idea behind the database construction. It shows the earth

coordinate system and the schematic representation of the geo-tagged database.

Earth geographic locations can be addressed precisely using latitude and longitude

coordinates. Latitudes vary from -900to +90along south-north corridor. Similarly

longitudes vary from -1800to +180Oalong west-east corridor. Latitudes are parallel to

each other and are equidistant. Every degree separation between latitudes equals

110.9 km in ground distance. The distance relationship between longitudes is not

that straightforward because they converge at the poles. This relationship is further

complicated as earth is not a perfect sphere.

@ dom0201

Scenario 3 domain 7

@ dom0202

Scenario 3 domain 8


Scenario 3 domain 9



3.1 IP Unicast vs Multicast

One of the prominent reason for higher deployment and end user demand for IP

unicast as compared to multicast is the ease of use associated with unicast. IP unicast

has unique IP addresses assigned to networked hosts. And for many hosts on the

Internet that provide services of some form to end users these addresses are long

term assigned (static). That allows for mnemonics to be assigned to dotted-decimal IP

addresses. With a global translation service an end user needs to know the mnemonics

only to access a desired resource such as an HTML page or access his email in-box.

This global name translation between mnemonics (or URLs / domain names) and the

dotted-decimal IP addresses is provided by Domain Name Service (DNS) [32]. Use of

domain names and URLs have made the Internet more usable for end users.

Most of the content made available to the end users on the Internet are static and

are made long term available. A web page hosted somewhere will most likely be found

at the same location for many weeks to come. This quasi-permanence of the data allows

search engines like Yahoo and Google to crawl the web and index the content. These

web indexes can be searched by end users to locate contents they desire. Existence of

web indexes and DNS service have, without arguments, made the Internet a much more

usable technology today compared to its wee days.

The scenario for IP multicast is totally different. Group addresses for content

delivery to interested users are not permanent. Further the content stream transmitted

over such multicast groups are typically very dynamic in nature. Generally speaking, IP

multicast traffic is not crawler technology friendly and the transient nature of the session

makes indexing almost impossible. Content discovery similar to that provided by web

search engines that would allow end users to locate a session of interest is non existent

in the Internet. The approach provided in Chapter 2 addresses that.

query. The search queries are sent asynchronously and in parallel in order to reduce

possible delays due to timeouts and various other network related artifacts.

rithm 3: MSD search algorithm

{Incoming: Search query from end user in the self domain }
if search scope set as administrative then
{Query Local Records database only, filter administratively scoped sessions }
{If needed, cross-reference Geo-DB database and return the candidates as result }
if search scope set as global then
foreach keywords in search query do
if keyword found in redirection cache then
| {send 'redirect' with necessary connection details to search client}
{send 'msd-probe' with target keyword towards target remote MSD }
{route 'msd-probe' using keyword routing table }
{wait for 'msd-probe-reply' from remote servers }

1 beg

25 end
26 end

Algorithm 3 shows what happens when a search query is received at the MSDd

from a search client in the same domain. Let us take a look at what happens when the

search query comes from an external search client. In that case, only globally scoped

sessions are searched because sending administratively scoped session details to a

if 'msd-probe-response' message is received then
{enter keyword and the MSD connectivity information into redirection-cache or
inversion-cache }
{send 'redirect' with new connectivity details to the search client}
// after receiving 'redirect' message, the search client is supposed to
initiate 'ext-search' protocol exchange sequence with the target MSD

// clients can send 'invalidate' back to the MSD server if the remote
server no longer maintains session records for the requested keyword, in
that case the MSD invalidates the stale cache entry for that keyword and
sends 'msd-probe' again to refresh the state entry.

// if client decides that the remote MSD server is down it can request the
MSD in its local domain for backup server details, that the server finds
out by sending 'msd-probe' with bit inversion set to TRUE

someone sitting at a far away location. Geo-tagging such multicast sessions would help

people discover relevant sessions faster and with higher accuracy. In fact in some cases,

multicast sessions could also foster better inter-agency coordination enabling them to

orchestrate an efficient relief program in the affected areas.

Geo-tagged multicast sessions could also herald an era of real-time yet discoverable

citizen news reporting by eye witnesses at news sites. Consider a scenario where a

major traffic pileup has occurred on 195, a few eye witnesses on accident site may start

a live video feed using their camera phones (modern cell phones are packing in more

and more compute power), using 3G [71] or GPRS [72], register the multicast session

using descriptive keywords such as, 195, pileup, accident etc. and let the whole world

watch the news as it unfolds.

Raging California wild fires have made the county officials issue voluntary

evacuations. Homeowners who decide to move out are always on their toes to find

out the status of their homes. A few daredevils who decide to stay back, could start a

video feed of their surrounding, geo-tagging their session with relevant location would

make such sessions discoverable with more accuracy and homeowners who vacated

could find the status of that area.

Furthermore network traffic if sourced from nearby location generally is more

reliable and impervious to network vagaries. Link capacities and traffic profile have a

tremendous impact on the quality of sessions that have a larger hop count. Therefore

usually one would want to get contents from sessions hosted from a location near


These are a few scenarios among many that suggest that geo-tagging of multicast

sessions could have significant impact on the way people would consider using multicast

in the future. Not only multicast will be a viable alternative in transmitting live broadcast

on the Internet but would also make it more appealing to general masses and would

help in creating demand of multicast services from consumers. It would also enable

replicates the data efficiently along the branches in the distribution tree. The overall

network load is also much lower in the case of multicast as compared to the unicast

case. Use of multicast makes economic sense where all the recipients are interested

in receiving same data stream. Live events broadcast to subscribers using multicast

should be preferable over unicast delivery model.

Since network bandwidth is a shared resource. Available bandwidth in the core

network is generally shared among all competing traffic. Ongoing debate on net

neutrality is in favor of maintaining this unbiased sharing of core network bandwidth.

Assuming that the competing data streams receive fair share of the link bandwidth,

overall stream data rate will be governed by the bandwidth it receives at the bottleneck

link along the route from source to destination node. In such a scenario, multicast can

play a big role in improving the perceived QoS at the receiving host.



Figure 1-2. Perceived data rate in unicast v multicast

Let us assume that the bottleneck link has 200 Kbps as shown in Figure 1-2. In

unicast scenario, there are 4 data streams sharing that bottleneck link, and therefore

assuming fair share of bandwidth each stream gets 50 Kbps rate. Even though the

recipient nodes immediate network may be capable of transmitting data at a much

higher rate, the unicast model becomes a limitation on the QoS perception at the

recipients end. In contrast, since there is just 1 stream along the bottleneck link, and

since the network replicates the streams as and when required along the branches

in the distribution tree, each recipient node receives the stream at 200 Kbps thereby

improving the QoS perception at recipient nodes.

Multicast also offer tremendous cost benefits to the content providers. A provider

transmitting data using multicast can potentially serve large subscriber base using a

small server farm as the bandwidth requirement would be fairly constant regardless

of the number of subscribers. In contrast, with unicast, the bandwidth requirement at

the source (content provider) grows linearly with the number of subscribers. A popular

content provider may potentially need to manage a large server farm and purchase

larger bandwidth at a premium from its Internet Service Provider (ISP).

1200 T

. 600




- unircast
- mul Iticast

0 1 2 3 4 5 6 7 8 9 10
numberof hosts

Figure 1-3. Bandwidth requirements vs number of recipients in unicast and multicast

Figure 1-3 shows the bandwidth requirements as number of recipients grow for

unicast versus multicast mode of data transmission at the content host (sender). The

figure assumes that the base data stream transmission is at 100 Kbps and the core



2.3.2 Associated Algorithms

Algorithms play pivotal role in a distributed system. They act as gel in bringing

separated components seamlessly together and provide location and failure transparency

to end users. In this section we will present key algorithms that aims to make the user's

multicast session search experience a seamless one. In the architecture based on the

DHT scheme discussed above, a content provider (a.k.a. multicast session creator)

must register its session with a local MSD server. The users generally present search

queries to the local MSD server although in some cases they can do a domain specific

search. All this works well if the hierarchy remains connected. But in rare cases an

upstream domain may leave the hierarchy (gracefully or abruptly), the overall system

has to cope with the situation as best as possible. Let us see these algorithms and

protocols in some details now. Session registration

Every domain that participates in the "mDNS" service hierarchy has at least one

MSD server hosted in the domain. Any session that is created or hosted from within that

domain must register the session details with the MSDd server in that domain. Figure

2-8 shows the screen-shot of the session registration tool implemented as part of the

'proof of concept' demonstration of this service.

As part of the registration data, the content host must provide a valid location, list

of keywords that closely describe the session content, scope of the session and other

associated parameters. The MSD server on receiving the registration request, creates

a session record for every keyword specified in the request and stores those records

under 'Local Records Database' regardless of the scope of the session.

If the session scope is global in nature, the MSDd server creates a 'remote-register'

[36] protocol message for each of the keywords and routes the request to remote
domains using the hash routing table maintained locally. In case if few such requests

are routed to 'self' then the MSD server creates a session record for that keyword


In the previous two chapters, Chapter 2 and Chapter 3, we have seen two different

issues facing IP multicast acceptance by an average end user, namely the ability to

locate relevant multicast streams and secondly, a convenient mechanism to remember,

bookmark and access a favorite stream in future. Chapter 2 dealt with a structured

proposal that allows an average user to locate a multicast stream along the similar lines

of keyword based web searches. Chapter 3 proposed a mechanism that would allow

the multicast streams to be assigned a mnemonic name just like a web-page URLs and

domain names. Use of mnemonics will greatly improve the recall-ability of stream names

as compared to unusable network IP addresses typically assigned to such streams.

Those two chapters dealt with the issues in isolation to each other. In this chapter we

will present the complete system architecture that merges the resources described in

Chapter 2 and 3 into a seamless global system that improves the overall usability of

multicast technology.

4.1 Revisiting Objectives

Before we delve deeper into integration of search and assignment of URLs to

multicast sessions lets refresh on the goals that this dissertation started out with.

We want to develop a structured, globally scalable, distributed service architecture

that allows end users to seamlessly search and discover desired multicast sessions,

bookmark sessions for later use in such a way that even if the multicast parameters

change later on, the user bookmarks would remain valid. From a content host

perspective, the system goal is set out to minimize latency between session creation

and its discoverability by the users. In Chapter 2 we discussed a tree DHT scheme and

described how keyword routing is done within the DHT structure. We described how

such a scheme allows fast session searches. In Chapter 3 we described a scheme that

leverages the DNS hierarchy in assigning URLs to the multicast sessions. Combining

Can a usable mnemonic be assigned to transient multicast address? There are

several challenges that have to be tackled before such a solution can become feasible.

Since there is no directive for use of certain groups of multicast addresses, a content

provider using multicast technology could change the group address as he wishes. The

name translation service must address such a scenario so that the mnemonic directs

the end users to the most up to date session details. In this chapter let us look in details

on the proposal that aims to solve the issue of mnemonic assignments to multicast

groups. Such a scheme along with content search and discovery capability for multicast

contents can help improve the multicast usability significantly.

3.2 Domain Name Service

Domain Name Service is the hierarchical distributed database that provides

name resolution service to end user applications like email clients, browsers and

several others. Use of mnemonics such as URLs and domain-names are preferred

over dotted-decimal IP addresses. To a network router, the mnemonics provides no

aid in determining where to route the data packet. The mnemonic hides any location

information. Routers prefer IP addresses since they are hierarchically interpreted from

left to right to determine the direction to forward the packet and thus bringing it closer to

the destination. DNS provides the translation service from mnemonics to IP addresses

thereby allowing the human usable addresses to be mapped to the router usable format.

Let us look at the details of the DNS service.

3.2.1 DNS Hierarchy

At the very top in the hierarchy are the 13 root servers named A through M. They

are named as Below the root level lie several top level domain

(TLD) servers. These include TLD servers for com, org, edu and all the country TLDs.

Below the TLDs are several organizations' or operators' authoritative or non-authoritative

DNS servers. Figure 3-1 shows the location and names of the 13 DNS root servers.

These servers are replicated for security and redundancy reasons.

AverageControl Bandwidth (X Beta, Y Jpha, Z CBW)

"et idatatxtn" u 1 2 -

Control Bandwidth Standard Deiatlon (X Beta, Y Alpha, Z STDEV)

"set dratatxt u 1 2 6

z aom
Z 0.015

Figure 5-6. Average control bandwidth -

scenario 1

AverageRouttng Swith (X Beta Y Alpha,Z R-Swltd)

"'etldatatx" u 1-2 7


Figure 5-8. Average route switches -

scenario 1

Figure 5-7. Control bandwidth standard

deviation scenario 1

Routing Switch Standard Devaton (X Beta, Y Alpha, Z STDEV)

"setldatatxt"u 1 28


S :

- 1.

Figure 5-9. Route switch standard deviation

scenario 1

Figure 5-4 shows the average hash-skew plot per domain for simulation hierarchy

type 1. Figure 5-5 shows the standard deviation in the hash skew values computed over

three experimental runs. Figure 5-6 shows the average control bandwidth usage per

domain in the network to maintain the domain hierarchy for simulation hierarchy type 1.

Figure 5-7 shows the standard deviation in the control bandwidth used per domain for

three runs of the simulation.

Figure 5-8 shows the average routing table switch per domain for different values of

a and / for hierarchy type 1. Figure 5-9 shows the standard deviation in routing switches

z 840

delays to reach to the target destination. The simulation results and analysis of the

structure is presented later in Chapter 5. This chapter presents necessary algorithm

for failure recovery, searches and other necessary operations supported in "mDNS".

As soon as a multicast session is registered at any domain, the details are routed to

appropriate destination in the hierarchy using keyword routing thereby immediately

making the session discoverable by end users. Use of geo-coding adds an extra

dimension to search that proves useful to end users who may be interested in finding

contents that are either locally hosted or are on a regionally significant topic.

The architecture, as it has been implemented in application layer as an overlay

achieves independence from lower layer details and is incrementally deployed. Even if a

'mDNS' domain is not linked to the global hierarchy, it can still provide valuable directory

services to the domain's local end users. In conjunction with URS it can allow users to

search and bookmark their popular multicast contents for later viewing.

Without loss of generality, assume that the root node does not participate as MSD

server (possibly a TLD server), then the share of hash-space allotted to each child

domain at the root level will be x 2m x 2(128-m) Therefore for child; -

share : ni x 2128 (5-7)

Further, each child node reallocates its assigned hash-space among its children and

itself, the space must be divided equally into n; shares, and thus each participating

domain's designated-MSD server share comes out to -

shareMSD : x 2128 + n, or x 2128 (5-8)

This of course is valid provided the domain hierarchy remains stable over time. As

new domains may be added and some domains may leave the mDNS hierarchy over

time, there could be times when the above equitable distribution might be violated for

short durations. This situation should not arise frequently and we conjecture, it would

mostly occur during bootstrapping process. This minor turbulence in stable equitable

distribution occurs because of the way Algorithm 2 (see Chapter 2) has been designed

to minimize routing instability and to reduce frequent routing flux.

Let us analyze the workload due to routing of search and registration requests to

appropriate MSD servers. Clearly, a node that comes higher up in the tree hierarchy

must carry out more routing responsibilities compared to a node that is located close to

the leaf domains. At any node in the routing tree, nodei, suppose there are m children

domains, then the routing load at that particular node; becomes
1 m
load, = x (Y county + 1) x 100% (5-9)

where county is the MSD count propagated to node; from its child sub-domain. This of

course assumes stable and equitable hash space distribution.

Now if every keyword is likely to be searched equally over long period of time, then

the workload on a mDNS node nodei over a duration of time "t" becomes -

workloadi = t x ratequery x probabilityrange (5-10)
probabilityrange = sha2 (5-11)

Now using equation (5-8) in equation (5-11), we get -

workload = t x ratequery x N (5-12)

which shows that the search related workload is also generally equitable provided that

the keywords are searched at equal likelihood. Although during short durations, some

keywords are more popular than others, however the trend over significant longer period

of time remains to be seen.

5.4.3 A Comparison with Other DHT Schemes

Let us now reason, in favor of our hierarchical DHT overlay scheme and unsuitability

of other DHTs, for mDNS architecture. One of the design goals of mDNS has been the

ability to assign long term URLs to the multicast streams registered with the service.

That necessitated a close design correspondence with existing DNS infrastructure.

Design criteria such as ability to filter out administratively scoped sessions from being

sent out to requesting users external to that domain led us to design the system along

domain hierarchy. Doing so made the search algorithm and the session database

management efficient and simple. But this choice led to a design that deviates from

typical P2P design. mDNS DHT overlay design is not a P2P design. In a typical peer to

peer architecture, every node is assumed to have similar responsibility and share the

same workload. Typically all nodes in a DHT based P2P scheme are at the same level.

Therefore, mDNS DHT is not a peer to peer design although it incorporates several

design principles found in a typical P2P DHT design.

105 Layered transmission and caching for the multicast session directory

In this published work [44] the authors in the effort to enable layered multimedia

transmission to receivers with varying capabilities, proposed modifications to "sdr" as

they proposed a two stage session directory service, a persistent server that caches

SAP announcements and ephemeral client that contacts the server to get the sessions

list thereby reducing the long latency associated normally with "sdr". Usual problems

with "sdr" still persists, users have to browse through a long session list in order to find a

session of interest. Towards multicast session directory services

In their article titled "Towards Multicast Session Directory Services" [45], the authors

have reflected on the limitations of session directory based on Session Description

Protocol and Session Announcement Protocol. They have argued that although "sdr"

approach is not scalable but the session discovery can be made better by standardizing

the additional attributes in SDP so that it can be organized and indexed in a separate

server that would provide Multicast Session Directory Service (MSDS) to end users.

These MSDS servers then can disseminate information on a well known single multicast

channel or multiple theme based multicast channels to the end users. IDG: information discovery graph

The Information Discovery Graph (IDG) [46] that has been developed as part of

Sematic Multicast project strives to provide a self organizing, hierarchical distributed

cache where multimedia sources register their content information and the topical

managers intelligently determine where in the hierarchy to store the session information.

Their approach still makes use of SAP like periodic announcements which is a waste

of bandwidth even though these announcements are mainly on the content managers

hierarchy information. It is still not clear how IDG enables end users to perform multicast

session search based on multiple keywords. How is the additional network hardware

proposed in this text that enables URLs to be assigned to multicast streams and co-exist

seamlessly with the current DNS architecture. It is referred to as "mDNS" in this text and

the intended usage of this name is to emphasize DNS like name resolution capability for

transient multicast streams.

The aim of this work is to herald in an era of true end user multicast use. If proper

infrastructure support is provided, one can imagine several interesting use cases to

emerge due to it. Real-time, truly scalable, citizen iReporting which would instantly

provide video feed to millions of viewers worldwide, could be one such use case.

Disaster preparedness and disaster management could be made easier. All this would

require an architecture that would allow new multicast sessions to be made discoverable

to others in a real time fashion. The proposed architecture achieves this very goal.

Some skeptics may ask 'Why not use Google or Yahoo search to achieve session

discovery?'. The answer for that can be found in section

1.3 Conclusion

This chapter described in brief basic building blocks for IP multicast. It gave

arguments for use of IP multicast. The benefits that multicast provides to content

providers, end users, core and fringe ISPs and other service enablers like CDNs

were also discussed. It described addressing for multicast, provided brief overview on

multicast routing and shared as well as dedicated distribution trees. It also described

IGMP in some details. IGMP allows end users to join or leave multicast group. It

also talked about reasons why multicast has not taken off in the same manner as

unicast even though in many data delivery situations it performs better and has a better

price/bandwidth ratio. The following chapters delve deeper into the issues raised here in

this chapter and their solutions and validation.








Figure 5-29. Range chart for latency experiments

Median Session Discovery Latency

-- MEDIAN 800

2 3 4

Average Latency

1 2 3 4 5

-*Average Latency

Figure 5-30. Median Latency

Figure 5-31. Average Latency

registration again at randomly chosen domain/node. These two random selections were

independent of each other.

Results Interpretation

For easy interpretation of results, we combined the three data, namely, number

of route updates, time taken to stabilize and hash 'skew', using weights of 0.5, 0.3

and 0.2 to arrive at a weighted score. Lower weighted score represents better

Max Ldtency

Min Latet-rnc

A Latency



O(logN) for a N node overlay. The routing is based on the successor relationship, as

long as a node knows its predecessor in the key space, any node can compute what

keys are mapped onto it. OpenDHT

OpenDHT [53] is a free and shared DHT deployment that can be used by multitude

of applications. The design goals focus on adequate control over storage allocation

mechanism so that each user/application gets its fair-share of storage and somewhat

general API requirements so that the overlay can be used by a broad spectrum of

applications. It provides a persistent storage semantics based somewhat on the

Palimpsest shared public storage system [54]. The implementation provides a simple

put/get based API for simple application development and a more sophisticated API

set called ReDiR. The main focus in this DHT scheme is starvation prevention and fair

allocation of storage to applications. Competing applications' keys are stored using

unique name-spaces assigned to each application. Keyword routing in OpenDHT is tree

based and is done hierarchically. Details can be found in [55]. Tapestry

Tapestry [49] [56] is a DHT overlay where routing is done according to the digits in

the node address. At each routing step, the message is routed to a node whose address

has a longer matching address prefix than the current node. The routing scheme is very

similar to scheme presented by Plaxton [57] with support for dynamic node environment.

In their scheme they propose using salts to store objects at multiple roots thus improving

availability of data in their scheme. They use neighbor maps to incrementally route

messages to destination ID digit by digit. The neighbor map entry in tapestry has space

complexity O(IogbN) where 'b' is the base for node IDs. Tapestry scheme uses several

backpointers to notify neighbors of node additions or deletions. Several successful

applications that use tapestry for message routing have been developed. Notable

4.3 Use of Caching

The delay incurred in resolving the target MSD connection details before sending a

'redirect' to the search client can be significantly reduced if the target MSD information

for a popular keyword is cached locally. Let us justify why MSD connection details are

suitable for caching.

In mDNS, once the hash-space allocation and hash-routing construction phase

stabilizes, the MSD connection details become stable as well. Unless many domains

join and leave the mDNS hierarchy in an arbitrary fashion, the hierarchy as well as

hash space allotment remains stable. One way a target MSD may change even if the

hierarchy itself is stable, is if the designated MSD server fails. In this case if a backup

MSD server is running, it will soon become the designated MSD server (after a fresh

leader election) and thus the IP address will change. But we expect such cases to be

very rare. These arguments make MSD connection details an excellent candidate for


With caches in place, when an end-user requests a keyword search for multicast

sessions, the domain-local MSD server checks the cache. If there is a cache hit, then

it immediately sends the cached connection details for the target MSD server to the

requesting end-user. The end-user tries to connect to the remote target MSD server;

if it succeeds, the delay incurred is reduced significantly. If it fails, most likely due to

changed connection information in the target domain (due to primary MSD server

failure), or if the target domain is not responsible for the keyword due to more recent

hash space reassignments (likely caused due to network topography changes), the

end-user prompts the domain-local MSD server to invalidate the stale entry. The

original two-pass protocol is then used, which refreshes the stale entry and the process

continues from there.

@ domOOl
@ dom020
@ dom0200
@ dom0201
@ dom0202

Scenario 2 domain 10


Scenario 2 domain 1


Scenario 2 domain 2




Piyush Harsh has been born in a well educated and scientifically oriented family.

His father, who is a M.D. has been the biggest influence on him, instilled scientific

curiosity right from his early childhood. He was always a good student, excelling in

studies in his high school. All his hard work paid off when he got a chance to go and

study at Indian Institute of Technology, Roorkee. He graduated with a bachelor's degree

in Computer Science and Technology in Spring 2003.

To further his scientific training he decided to accept full scholarship from University

of Florida and joined into the Ph.D. program at Department of Computer Science and

Engineering in Fall 2003 and came to the US. He was the first one ever to travel abroad

for higher education in his family.

Under the able guidance of Dr. Richard Newman (his adviser) and his Ph.D.

committee members, especially Dr. Randy Chow, he was involved in numerous scientific

projects. During his stay at University of Florida, he worked in the fields of security,

computer networks and cognitive computing. Lately his research interest has been

focused on bio-inspired network models including ways to adapt models of human brain

into future network design.

When he is not doing research work, he enjoys outdoor activities including long

distance trail biking and hiking in nature reserves. He believes in preservation of

environment and aspires to be an active participant in near future in this nobel cause.

Figure page

1-1 Data transmission in unicast v multicast .... 15

1-2 Perceived data rate in unicast v multicast ..... 16

1-3 Bandwidth requirements vs number of recipients in unicast and multicast .. 17

1-4 Multicast address format in IPv6 .......................... 19

1-5 IGMP v3 packet format membership query ..... 22

1-6 IGMP v3 packet format membership report . 23

2-1 Local and global session records structure ..... 30

2-2 A general domain hierarchy ............................. 32

2-3 Example routing table structure ..... ......... 35

2-4 Steps in DHT domain addition ........................... 36

2-5 DHT record insertion example ........................... 37

2-6 Global sessions database design ..... .. ..... 40

2-7 Geo-tagged database design ................. ......... 42

2-8 Screenshot session registration tool ... 45

2-9 Session registration .................... ........... .. 46

2-10 Session search ...... .. .. .. ... ...... 47

2-11 Parent node failure recovery strategy ... 51

3-1 Location and names of DNS root servers [source: ICANN] ... 63

3-2 Typical steps in 'mDNS' URI name resolution . ... 67

4-1 A typical mDNS domain components ..... ...... 70

4-2 A typical mDNS hierarchy in ASM network ... 72

4-3 A mDNS hierarchy in mixed network operation mode ... 73

5-1 Screenshot mDNS auto simulator program . 84

5-2 Screenshot mDNS latency measurement tool. .. 85

5-3 Various network topologies chosen for simulation ... 87

Table 3-1. Common DNS record types
Record Type Name Value
A hostname IP Address
NS domain authoritative DNS server name
CNAME alias hostname Canonical hostname
MX alias hostname name Canonical name of mail server

3.3 URL Registration Server

The URL Registration Server (URS) main task is to ensure uniqueness among

all registered session identifiers within a particular domain. Further, it also acts

as a bootstrapping device for MSD servers running in that domain. The system

administrators are needed to set a few configurable parameters in the URS and rest

of the components in 'mDNS' are self configurable. Just like a DNS server can be

replicated for security and redundancy, so can an URS in a domain be as well. The DNS

server in a particular domain has an 'A' record for the URS. The name 'mcast' is used. A

typical DNS record entry file may look something like this -

$TTL 604800

@ IN
mcast IN (
2006020249 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
604800); Negative Cache TTL

10 mail

Considering the above example of DNS settings, one can access the URS using URL string. If multiple URS servers are maintained at a domain,

the DNS server load balancing feature might be used to handle high traffic situations.

Let us now take a look at the URS components.

'mDNS' proposal uses a distributed tree DHT structure to distribute the session records

over a few number of nodes that can be queried for desired multicast records. The

architecture scales well and has a low maintenance overhead (communication).

4.6.2 Existence in Present Network Environment

Network components are expensive and also extremely difficult to upgrade.

Network administrators generally do not want to change the core routing infrastructure

as it is a cumbersome task. Any new proposal that aims to be deployed fast must be

able to work in the existing network environment. Changes in the network stack is

specially difficult to effect. 'mDNS' architecture is an overlay structure implemented

entirely in the application layer. The proposal requires no change in the existing network

stack and needs no network hardware upgrades.

4.6.3 Real Time Session Discoverability

sdr and several similar proposals, because of the limitations placed by the

SAP/SDP protocol bandwidth limitations, had a very slow rate of session record

propagation to the remote sdr clients. From a content providers perspective, this

could be frustrating. In the proposed architecture, the session details are routed over

the DHT hierarchy immediately thereby making the session details searchable and

thus discoverable by end users immediately. This service feature becomes critical for

sessions that are extremely transient in nature. 'mDNS' achieves this requirement.

4.6.4 Ability to Perform a Multi-Parameter Search

Every multicast session in the proposed scheme can be tagged with up to 10

descriptive keywords and further they are relevantly geo-tagged. The proposed

architecture allows an end user to perform multi parameter search and supports all

major boolean operators for combining search parameters. The scheme allows users to

narrow down search results based on their geographical preferences as well.

themachingession are r. restig part.

URS Server

Figure 2-10. Session search

Figure 2-10 shows the general scheme behind global search support in the

architecture presented. The end user makes the session search query to the domain

local MSDd server. MSD server parses the query and if the scope of the search is

'administrative' only, then only the 'Local Session Records' database is searched and

the matching sessions are returned back to the requesting party.

However, things get somewhat complicated in the case of global search. A naive

way would have been to flood the query to all participating domains. Because of the

DHT tree structure and keyword routing using hash values, the search is more efficient.

MSDd parses the query string and transforms a single search query into multiple

'msd-probe' protocol messages [36] for each unique keyword present in the search

Average Control Bandwidth (X Bet. Y Npha, Z CBW) Control Bandwidth Standard Deiation (X Beta, Y Alpha, Z STDEV)
"set2datatxt" u 1 2 5 "set2datatxt" u6 1 26

604 00

Figure 5-14. Average control bandwidth Figure 5-15. Control bandwidth standard
scenario 2 deviation scenario 2

7Av geRou9tng Swtc X Beta Y Alpha, RSltc) Routing0 w itandard eviton (X Beta Y Apha. TDE V)
"et2datutxt" U 1 2 7 -"set2dtatxt u 1 2 8
"2 O12, 02, '-

Figure 5-16. Average route switches Figure 5-17. Route switch standard
scenario 2 deviation scenario 2

Figure 5-12 shows the average hash-skew per domain for different values of oa and
7 for simulation scenario 2. Similarly Figure 5-14 shows the plot for average control

bandwidth in bytes/second for simulation scenario 2, Figure 5-16 depicts the plot for

average routing table switch and Figure 5-18 shows the stabilization time (in seconds)

for routing flux to subside for simulation scenario 2. Figures 5-20, 5-22, 5-24, and 5-26

shows the same figure types but for simulation scenario 3. Our simulation run for each
e covers and values between 0.1 to 2.0 with ste size 0.2 with
62 62
2 |2

Figure 5-14. Average control bandwidth Figure 5-15. Control bandwidth standard
scenario 2 deviation scenario 2

F argeRot5-12s howX Bte.Y A vp er.Z R-t h a hk) p rd m ing St-drdD-,.t, (X Ben Y Alpha. Z SIDEV a
ban2ddtwi 1i27 byesseon W' 1 26

Figure 5-16. Average route switches Figure 5-187. Route switch standard

for routing flux to subside for simulation scenario 2. Figures 5-20, 5-22, 5-24, and 5-26

shows the same figure types but for simulation scenario 3. Our simulation run for each

type covers a and 6 values between 0.1 to 2.0 with step size 0.2 with a < 3.

prefix of its ID. Routing of queries to their destination proceeds due to the fact that each

node knows a node in a series of successively lower subtree where this present node

where the query arrived does not lie. This results in a query being routed in logarithmical

number of steps.

Routing table in Kademlia is arranged in k-buckets of nodes whose distance lies

between 2' and 2i1" from itself for 0 < i < 160. 'k' is a design parameter which the

authors chose as 20. The routing table is itself logically arranged as a binary tree where

each leaves are k-buckets. Each k-bucket covers some range of the ID space and

together they cover the entire 160 bit ID space. CAN: a scalable content addressable network

CAN [39] is an overlay where node space is a d-dimensional coordinate space.

The coordinate space at any time is completely partitioned among all participating N

nodes. Each key in CAN is mapped to a coordinate in the CAN coordinate space and

thus is mapped to the node managing the space within which this key lies. Routing is

done by forwarding message to the neighboring node whose coordinate is closest to

the destination coordinate. A CAN node maintains a coordinate routing table that holds

virtual coordinate zone of each of its immediate neighbors only. If there are 'n' nodes

that divides the whole coordinate space into n equal zones, then average routing path

length in CAN is (d/4)(nl/d) hops and individual nodes maintain 2d neighbors for a

d-dimensional coordinate space. The authors propose using multiple 'reality' along with

multiple peers in each zone and multiple hash-functions for routing optimizations and

improving the overall availability of data in their scheme.

2.5 Conclusion

This chapter provides a detailed discussion of the DHT scheme and the keyword

routing scheme that forms the backbone of multicast session search and discovery in
"mDNS". The scheme presented is adaptable to changes in topology with a major goal

to distribute storage evenly across all participating domains and reducing the routing

search client in the remote domain is pointless. Algorithm 4 shows what happens when

an 'ext-search' message is received at the target MSDd server.

Algorithm 4: MSD external search algorithm
1 begin
2 {Incoming: Search query 'ext-search' from end user in an external domain }
// only globally scoped session are returned
4 set keyword hash hash(keyword)
5 if keyword hash lies within assigned hash space then
6 {search 'Global Session Records' database for candidate sessions }
7 {if needed cross-reference with 'Geo-DB' database }
8 {send qualifying session records as search response back to the remote client }
9 end
10 else
11 {return 'ext-search-invalid' message to the remote client }
12 end
13 end

For search operation to work properly, the 'mDNS' hierarchy must remain connected.

But domains may end their participation in 'mDNS' architecture or may crash for

unknown duration. Some of these domains could have child domains below them and

their failure will leave the children nodes with no mechanism to forward the messages

using keyword-routing scheme to the next higher layer. Let us now look at a failure

recovery strategy that deals with this very problem. Recovering from parent node failures

A parent node periodically sends a heartbeat message to all its children over the

CMCAST child multicast channel or via unicast, if some children are not able to receive

multicast messages from it. If a child node was initially subscribed to receive parent's

communication over PMCAST parent multicast channel (PMCAST at child node is

same as CMCAST at parent's node) and is not receiving any parent messages for a

set number of consecutive timeouts, it tries to contact the parent node by unicast to

inform it to switch its communication to unicast. If this process fails, or if already it was

subscribed to receive parent's communication over unicast channel and did not get any

heartbeat message for a consecutive set number of timeouts, it initiates parent node

4 cos ( )2 +b4sin ( )2
x cos O X (2-1)
1800 (a cos )2 + (b sin )2

Equation (2-1) shows the east-west distance between every degree change in

longitudes at latitude Q with a = 6, 378, 137 m and b = 6, 356, 752.3 m.
West Longitude East Longitude
180"170" 160150" 140130" 120"110 100" 90 80 70 60" 60 40 30" 20" 10" 0" 10 20 30" 40" 50' 60" 70" 80" 90" 100"110 120"130"140"150" 160"170" 180"
------------.--- --------,--, -------^ -^ -^ --^ ^ --- --- ,-- --- ,---- ---,--- 0oN
---- 70N
--1 -- ----~--- ---i--r- -- ------ 5-- N
: : : ----- -\-T ----- -.. T-,- ---. -- I ---- -- ..... .... 40oN
"i'-- -, -- --:-----, T30N^-:-, --, ------,-----,--- o 0"N

', ------ ----',---T--]--- -1 ,-- --T--- -- -I, ------T-- '---- Q S --- --------- 1O
'- i-- i-- i--- i------- --: l-----i-i-- i-- i------ -- S e ios

r te -- curt gd mp ---ret gd mp w e m r ls --bei----ng lai--tes an -n--git e S
i--i -T------i-- ---i---- --i---- -i--"-- :-- --i--_ --- -------i---i --- ------- L --- t -,---r_ ----- ---- -- s
---------- --------- ---- 80S

bodies. Of the remaining 30% of landmass, research shows only 50% of land area is
iln|':d^l ]is^


Figure 2-7. Geo-tagged database design

Under the current grid map where major lines being latitudes and longitudes, each

being 1 "apart, earth can be mapped into 180x360 grid space. Since almost 70% of

earth surface is covered by water, 70% of the grid locations naturally would map to water

bodies. Of the remaining 30% of landmass, research shows only 50% of land area is

inhabited by human. Therefore, we foresee only 15% of full grid locations to be ever

used to group multicast sessions belonging to such grid position depending on their

geo-tags. Therefore sparse-matrix implementation of planetary grid seems reasonable.

provided to the content creator so that if any session parameter changes, the tool

automatically updates the local URS.

3.3.2 mDNS Name Resolution

Assuming that each domain has a DNS server running and it has a valid FQDN

assigned to the network domain, one can construct a unique URI for every multicast

session that has a unique identifier registered with the URS. The URI will be relative to

the URS server's URI. For example, if the FQDN for a domain is doml.somenetwork.example,

and the URS server has an 'A' entry in the DNS server with value 'mcast', then the URI

of the URS becomes mcast.doml.somenetwork.example. Furthermore if a multicast

session creator has registered a unique identifier channell' with this URS server, the

URI for his/her multicast stream in this architecture becomes -

mcast.dom 1.somenetwork.example/channelx

In every participating domain there must be a URS installed and operational. Now let us

see how when an end user accesses a book-marked multicast session, the architecture

is able to resolve the URI and let them access the multicast stream. Figure 3-2 shows

the steps involved.

Let us say the user is trying to access a multicast video stream that has an 'mDNS'

URL The base URL string is resolved using the

standard DNS name resolution algorithm. The name resolves to the URS operating

in the domain. The end user client software then requests the URS for the

record associated with the identifier gators. The content provider has already registered

the session details with the 'URS-identifier' set to gators and so the URS locates the

relevant record and sends it back to the end user. This record has all the necessary

parameters needed by the multicast stream receiver to join the relevant session. If the

record gators was not found at the target URS, the name resolution would have failed. In

that case a 'Resource Not Found' type error message will be displayed at the end user's


routers to start forwarding data packets for the multicast group they are interested to

receive data from using IGMP report messages. The routers take appropriate action to

join the distribution tree (RP based or source specific) if they have not already done so

far, for some other host for that same multicast group. Now that we have briefly seen the

essential building blocks for enabling native multicast to be deployed in the network, let

us try to understand the reasons for its lack of mass deployment and user acceptance in

the Internet. Users perspective: low usability

In the case of unicast, because of the longevity of the address assignments to

hosts, it has been possible to assign aliases to IP addresses and setup a global name

resolution system to resolve these aliases to associated IP address. Universal Resource

Locators (URLs) are the address aliases that has made the Internet more usable for an

average user. DNS [32] is the name resolution service that maps FQDNs to appropriate

IP addresses and in turn makes the use of URLs possible. As most of the resource on

the Internet are stable resources with long term availability, end users can bookmark

FQDNs and URLs for future use.

Long term stability of resources and their availability has another benefit. These can

be indexed by search engines using web crawlers. This allows users to locate content

over the Internet using keyword searches. Keyword searches allowed by search engines

like Yahoo and Google along with the use of URLs and FQDNs have helped improve the

usability of web for an average user.

As documented earlier, lack of prior knowledge of group composition and the time

and duration of a group's existence along with no explicit restriction on the use of a

group address other than the general classification enforced by the IANA presents

several challenges that are absent in the case of unicast.

1. unstable group address: because of no long term stability associated with
multicast addresses assigned to user groups, these can not be used with the
current DNS scheme. DNS has been designed with stability of FQDNs and


Figure 5-3. Various network topologies chosen for simulation

The above figure shows scenario 1 that is a somewhat balanced domain arrangement

in the hierarchy. Scenario 2 and 3 shows the two extremes of the domain linkage

scheme. Scenario 2 is the two level scenario where there is only one parent domain

(flat arrangement, tree of height two) and scenario 3 is the other extreme where all the

domains are linked in a linear order (tree of height 10). In the figure, the direction of an

arrow shows the relationship "is a child" of, e.g., A -> B means A is a child of B.

For all the three scenarios we configured the simulation controller to start the virtual

domain according to the permutation: [10, 4, 5, 6, 1, 2, 7, 8, 9, 3] and inter-domain

delay values [5, 5, 5, 10, 30, 600, 5, 5, 300, 30]. The value in permutation location i

acts as pointer to position in the delay-list for locating the delay value to wait before

starting the next domain. This is how the simulation controller acts: it first starts virtual

domain 10, looks into 10th place in the delay-list, finds the value 30, waits for 30 seconds

before starting the virtual domain 4 and so on. Another set of values that we used in our

simulation was domain start-up permutation value [10, 1, 4, 5, 2, 3, 6, 7, 8, 9] and delay

values [5, 5, 5, 5, 5, 540, 5, 5, 5, 5].

5.3 Simulation Results

Table 5-1 shows a partial list of values of measured system parameters for scenario

1 hierarchy using the domain startup permutation list [10, 4, 5, 6, 1, 2, 7, 8, 9, 3]

and inter-domain startup delay values [5, 5, 5, 10, 30, 600, 5, 5, 300, 30]. Each row

Table 5-1. Partial simulation data for scenario






1 hierarchy for permutation list [10, 4, 5, 6, 1,
0 5.2 0 649.70
0.001 5.3 0 649.70
0.003 3.4 0 495.97
0 5.3 0 649.77
0 4.267 0.058 496.53
0.001 4.2 0 496.00
0.001 4.5 0 459.63
0 3.4 0 496.03
0.01 4.3 0 460.23
0.001 2.967 0.058 459.67
0.003 4.5 0 459.33
0 3.4 0 496.03
0.001 4.3 0 459.67
0.001 3 0 459.67
0 3 0 459.70
0.012 4.9 0 687.37
0.001 3.9 0 499.03
0.001 3.4 0 498.10
0 3.3 0 496.03
0.002 3.3 0 497.00
0.001 3.1 0 496.57
0 4.9 0 686.33
0.001 3.9 0 499.07
0.001 3.4 0 497.60
0 3.3 0 497.03
0 3.3 0 496.03
0 3.1 0 496.07
0 3.1 0 496.03
0.018 4.867 0.058 687.70
0 3.9 0 499.07






I would like to extend my gratitude to my adviser, Dr. Richard Newman, who has

been more of a father figure and a friend for me than just an adviser. His thoughts on life

and the numerous discussions I have had with him over the number of years on almost

everything under the sun has helped me a lot become a person that I am today.

Special thanks to Dr. Randy Chow who guided me and took time out of his very

busy schedule when Dr. Newman took a sabbatical break. His meticulous approach to

scientific quest and his knowledge of how the system works was an eye opener.

I would also like to thank all my friends that I made over the period of my stay at

University of Florida, including Pio Saqui, Jenny Saqui, InKwan Yu, Mahendra Kumar

and others (you know who you are) for keeping me sane and grounded. All of you have

been a very pleasant distraction. You all will always be in my heart and mind.

I would like to thank all of CISE office staff especially Mr. John Bowers for taking

care of administrative details concerning my enrollment and making sure things proceed

smoothly for me. I would like to thank CISE administrators with whom I had numerous

discussions on intricacies of managing a large computer network. Notable among them

are Alex M. Thompson and Dan Eicher.

Lastly I would like to acknowledge UF Office of Research, ACM, College of

Engineering, and UF Student Government for providing me with numerous travel

grants for attending conferences held all over the world.

-I f

'. IM; i-^ i

i-<; L,

Figure 3-1. Location and names of DNS root servers [source: ICANN]

3.2.2 DNS Name Resolution

The DNS name resolutions begins when an end user application sends a DNS
resolution request to the local DNS client. The initial DNS server connection information
is fed automatically in many case through DHCP [63] [64] to the client's machine. The
client side DNS server asks the root servers for the address of the respective TLD DNS
server. The TLD DNS server has an entry that points to the authoritative DNS server for
the domain that has to be resolved. The local DNS client then queries the authoritative
DNS server and gets the IP address of the mnemonic (domain-name) to be resolved.
The DNS name resolution is done using both recursive and iterative resolution.
Resolution proceeds iteratively until the query reaches the authoritative name server
and if there are local servers below that, the resolution proceeds recursively until the
address record is located and sent back to the requesting DNS client. A DNS server
maintains several record types in its internal database. Let us take a brief look at some
of the common records that are stored as part of the DNS database.
3.2.3 DNS Records

DNS records are identified by their record types. Table 3-1 shows the most common
record types. Now that we have seen the basics of a DNS server, let us see in details
how an URS is designed and how it achieves its intended goals.

3.3.1 URS Internals

Every URS maintains a URS records database. The record member elements are

very similar to MSD 'Local Session Record' structure shown in Chapter 2 with a few

minor differences. The elements are listed here -

* Expiration Time: Time after which session records may be purged from the URS
URS Identifier: Unique session identifier registered with the URS server (see
Chapter 4)
Channel IP: Multicast session IP
Channel Port: Multicast session Port
Source IP: if network type is SSM, this IP determines the content host machine
Fail Over Unicast IP: backup unicast stream source IP address (optional)
Fail Over Unicast Port: backup unicast stream port (optional)
Channel Scope: Multicast stream scope (global/local)
Geographical Common Name: Common name of the place associated with the
Latitude: Latitude value associated with the session
Longitude: Longitudinal value associated with the session
Network Type: multicast compatibility of the session's hosting network (ASM/SSM)
Stream Type: Identifies the nature and type of the multicast stream
Preferred Application: Identifies the suggested application to be used to access the
stream type
CLI Arguments: Command Line Interface (CLI) Arguments denotes arguments to
be supplied to the preferred application
Mime Type: MIME type of the stream data, it must be one of the IANA registered
MIME types

This record helps in 'mDNS' URI name resolution process. A URS only maintains

records for sessions created in its domain only. The uniqueness in the 'URS Identifier'

value is only enforced with respect to its own domain. That is, no two sessions created

in that domain and registered with the URS will have same 'URS Identifier'.

The content provider in a domain is required to register the session details along

with a unique URS identifier with the URS in his/her domain. If in future his/her session's

connection parameters changes, he/she is required to immediately update the URS

record. This updation process can be automated in the session management tool


[1] R. Wright, IP Routing Primer. Macmillan Technical Publishing, 1998.

[2] C. Partridge, T. Mendez, and W. Milliken, "Host Anycasting Service," RFC 1546
(Informational), Internet Engineering Task Force, Nov. 1993. [Online]. Available: [Accessed: July 21, 2010]

[3] B. Williamson, Developing IP Multicast Networks. Cisco Press, 1999.

[4] B. M. Edwards and B. Wright, Interdomain Multicast Routing: Practical Juniper
Networks and Cisco Systems Solutions. Boston, MA, USA: Addison-Wesley
Longman Publishing Co., Inc., 2002, foreword By-John W. Stewart.

[5] S. Bhattacharyya, "An Overview of Source-Specific Multicast (SSM)," RFC 3569
(Informational), Internet Engineering Task Force, July 2003. [Online]. Available: [Accessed: July 21, 2010]

[6] S. Deering, "Host extensions for IP multicasting," RFC 1112 (Standard), Internet
Engineering Task Force, Aug. 1989, updated by RFC 2236. [Online]. Available: [Accessed: July 21, 2010]

[7] D. Farinacci and L. Wei, Auto-RP: Automatic discovery of Group-to-RP
mappings for IP multicast. CISCO Press, Sept 9, 1998. [Online]. Available: [Accessed: July 20, 2010]

[8] D. Meyer, "Administratively scoped IP multicast," RFC 2365 (Best Current Practice),
July 1998. [Online]. Available: [Accessed: July
20, 2010]

[9] M. Handley, "Session directories and scalable internet multicast address allocation,"
SIGCOMM Comput. Commun. Rev., vol. 28, no. 4, pp. 105-116, 1998.

[10] D. Zappala, V. Lo, and C. GauthierDickey, "The multicast address allocation problem:
Theory and practice," Special Issue of Computer Networks, 2004.

[11] V. Lo, D. Zappala, C. Gauthierdickey, and T Singer, "A theoretical framework for
the multicast address allocation problem," in IEEE Globecom, Global Internet
Symposium, Tech. Rep., 2002.

[12] M. Livingston, V. Lo, K. Windisch, and D. Zappala, "Cyclic block allocation: A
new scheme for hierarchical multicast address allocation," in in First International
Workshop on Networked Group Communication. Bowersock, 1999, pp. 216-234.
[Online]. Available:
[Accessed: July 20, 2010]

[13] S. Pejhan, A. Eleftheriadis, and D. Anastassiou, "Distributed multicast address
management in the global Internet," Selected Areas in Communications, IEEE
Journal on, vol. 13, no. 8, pp. 1445-1456, Oct 1995.


@ dom020

Scenario 3 domain 3

@ dom0200

Scenario 3 domain 6


[14] S. Kumar, P Radoslavov, D. Thaler, C. Alaettinoolu, D. Estrin, and M. Handley, "The
MASC/BGMP architecture for inter-domain multicast routing," SIGCOMM Comput.
Commun. Rev., vol. 28, no. 4, pp. 93-104, 1998.

[15] V. Jacobson, "Multimedia conferencing on the Internet," SIGCOMM, Aug 1994.

[16] Y. K. Dalal and R. M. Metcalfe, "Reverse path forwarding of broadcast packets,"
Commun. ACM, vol. 21, no. 12, pp. 1040-1048, 1978.

[17] D. Waitzman, C. Partridge, and S. Deering, "Distance Vector Multicast Routing
Protocol," RFC 1075 (Experimental), Internet Engineering Task Force, Nov. 1988.
[Online]. Available: [Accessed: July 21, 2010]

[18] A. S. Thyagarajan and S. E. Deering, "Hierarchical distance-vector multicast routing
for the MBone," in SIGCOMM '95: Proceedings of the conference on Applications,
technologies, architectures, and protocols for computer communication. New York,
NY, USA: ACM, 1995, pp. 60-66.

[19] D. Estrin, D. Farinacci, A. Helmy, V. Jacobson, and L. Wei, "Protocol independent
multicast (PIM) dense mode protocol specification," 1996. [Online]. Available: [Accessed: July
20, 2010]

[20] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson,
C. Liu, P. Sharma, and L. Wei, "Protocol Independent Multicast-Sparse Mode
(PIM-SM): Protocol Specification," RFC 2362 (Experimental), Internet Engineering
Task Force, June 1998, obsoleted by RFCs 4601, 5059. [Online]. Available: [Accessed: July 20, 2010]

[21] J. Moy, "Multicast Extensions to OSPF," RFC 1584 (Historic), Internet Engineering
Task Force, Mar. 1994. [Online]. Available:
[Accessed: July 20, 2010]

[22] A. Ballardie, "Core Based Trees (CBT) Multicast Routing Architecture," RFC
2201 (Historic), Internet Engineering Task Force, Sept. 1997. [Online]. Available: [Accessed: July 20, 2010]

[23] T. Bates, R. Chandra, D. Katz, and Y. Rekhter, "Multiprotocol Extensions for BGP-4,"
RFC 4760 (Draft Standard), Internet Engineering Task Force, Jan. 2007. [Online].
Available: [Accessed: July 20, 2010]

[24] D. Thaler, "Border gateway multicast protocol (BGMP): Protocol specification," RFC
3913 (Historic), Sep 2004. [Online]. Available:
[Accessed: July 20, 2010]

[25] B. Fenner and D. Meyer, "Multicast Source Discovery Protocol (MSDP)," RFC 3618
(Experimental), Internet Engineering Task Force, Oct. 2003. [Online]. Available: [Accessed: July 21, 2010]


4.6.5 Fairness in Workload Distribution

The proposed architecture is a collaboration among several independent administrative

domains. In a collaborative environment, it becomes important to distribute responsibilities

evenly. 'mDNS' achieves fair hash-space allotment to participating domains. The

division is periodically updated to reflect changes in the global topology. Although we

must agree that equitable hash space distribution does not guarantee fair workload

distribution. Skewed popularity of some keywords over others would increase the

database load on that domain which was assigned the hash space where that popular

keyword routes to. Further the communication overhead increases as one goes up the

tree. Regardless, the proposal kept workload distribution as a goal during design phase.

4.6.6 Plug-n-Play Design With Low System Administrator Overhead

In 'mDNS', the system administrator of a domain needs to set only a few parameters

in URS. Other components do not need any administrator involvement. The 'mDNS'

hierarchy is self adaptive to changes in topology and is able to detect failures and is

equipped to recover from intermittent failures on its own. Reducing system administrators'

involvement in the management of the global hierarchy was a major design goal.

4.6.7 Partial and Phased Deployment

A new architecture can not be expected to be universally deployed over a short

time duration. The success of the proposal depends on the value added to the Internet

even if deployed at a very small scale. As and when the user demand grows, larger

deployments will happen and they should seamlessly integrate with the existing

deployed infrastructure. 'mDNS' can be deployed in phases. A stand-alone deployment

in a domain would provide search and bookmark-ability of sessions in that domain

and domain specific search capability to global users. With gradual network wide

deployment, 'mDNS' domains can link up seamlessly and manage the DHT on their


4.6.2 Existence in Present Network Environment .... 79
4.6.3 Real Time Session Discoverability ... 79
4.6.4 Ability to Perform a Multi-Parameter Search .... 79
4.6.5 Fairness in Workload Distribution ..... 80
4.6.6 Plug-n-Play Design With Low System Administrator Overhead 80
4.6.7 Partial and Phased Deployment .. .. 80
4.6.8 Self Managem ent ............................ 81
4.6.9 Multicast Mode Independence .. ... 81
4.7 Looking Back High Level Assessment of the 'mDNS' Service Framework 81
4.8 C conclusion . .. 82


5.1 Introduction . .. 83
5.2 Simulation Environment and Strategy Description .... 83
5.2.1 Starting the Sim ulation ......................... 85
5.2.2 Validity . . 86
5.2.3 Simulation Domain Hierarchy Setup ..... 86
5.3 Sim ulation Results ............................... 87
5.3.1 Latency Experiment Results .. .. 95
5.4 Qualitative Analysis and Comparison .. ... 101
5.4.1 Geo-Tagged Database Complexity Analysis ... 101
5.4.2 Hash-Based Keyword Routing Fairness Analysis ... 103
5.4.3 A Comparison with Other DHT Schemes ..... 105
5.5 C conclusion . . 107

6 CONCLUDING REMARKS ............................. 108


R EFER ENC ES . . 127

BIOGRAPHICAL SKETCH ................................ 134


Scenario 2 domain 3


Scenario 2 domain 4



[37] A. I. T. Rowstron and P. Druschel, "Pastry: Scalable, decentralized object
location, and routing for large-scale peer-to-peer systems," in Middleware '01:
Proceedings of the IFIP/ACM International Conference on Distributed Systems
Platforms Heidelberg. London, UK: Springer-Verlag, 2001, pp. 329-350.

[38] P. Maymounkov and D. Mazires, "Kademlia: A peer-to-peer information system
based on the XOR metric," in Lecture Notes in Computer Science. Springer Berlin
/ Heidelberg, 2002, pp. 53-65.

[39] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, "A scalable
content-addressable network," in SIGCOMM '01: Proceedings of the 2001
conference on Applications, technologies, architectures, and protocols for computer
communications. New York, NY, USA: ACM, 2001, pp. 161-172.

[40] M. Handley, "The sdr session directory: An MBone conference
scheduling and booking system," April 1996. [Online]. Available: http:
// [Accessed: July 20, 2010]
[41] P. Namburi and K. Sarac, "Multicast session announcements on top of SSM,"
Communications, 2004 IEEE International Conference on, vol. 3, pp. 1446-1450
Vol.3, 20-24 June 2004.

[42] P. Liefooghe and M. Goosens, "The next generation IP multicast session directory,"
SCI, Orlando FL, July 2003.

[43] C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz, "The
Harvest information discovery and access system," Computer Networks and ISDN
Systems, vol. 28, no. 1-2, pp. 119-125, December 1995.

[44] A. Swan, S. McCanne, and L. A. Rowe, "Layered transmission and caching for
the multicast session directory service," in ACM Multimedia, 1998, pp. 119-128.
[Online]. Available:
[Accessed: July 20, 2010]

[45] A. Santos, J. Macedo, and V. Freitas, "Towards multicast session directory services."
[Online]. Available:
[Accessed: July 20, 2010]

[46] N. Sturtevant, N. Tang, and L. Zhang, "The information discovery graph: towards
a scalable multimedia resource directory," InternetApplications, 1999. IEEE
Workshop on, pp. 72-79, Aug 1999.

[47] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, "Chord: A
scalable peer-to-peer lookup service for internet applications," SIGCOMM Comput.
Commun. Rev., vol. 31, no. 4, pp. 149-160, 2001.


F.-i He-lr

Figure 5-1. Screenshot mDNS auto simulator program

The host environment for running the simulation was a Windows machine with the

following configuration -

CPU: Intel Core2 Quad Q6600 @ 2.40 GHz
Memory: 5120 MB
OS: Windows 7 Professional 64 bit

We also developed a session discovery latency measurement tool that allowed us

to measure the latency between session registration and its discovery by end users.

The tool allowed us to start up to 10 virtual connected domains and took a list of keys

the duration of the sessions, the approach generates tremendous amount of traffic in

the network. Clearly "sdr" approach is not scalable as the number of content providers


Another problem with "sdr" and its underlying SAP implementation is caused due to

announcements burst. The delay between burst cycles are greater than multicast routing

states timeout period. This is caused due to default bandwidth restriction of 4000bps in

SAP. This leads to unnecessary control packets being sent in the network recreating the

already timed out multicast distribution tree in the core network. Multicast session announcements on top of SSM

In their published work titled Multicast session announcements on top of SSM"

[41], authors have tried to address some of the issues in "sdr". They proposed a
multi-tier mesh of relay proxy servers to announce multicast sessions using SSM to

interested recipients. In their approach, every network operator that provides SSM

service also runs a SAS (Session Announcement Server). They propose relaxing the

bandwidth limit of SAP in local networks to a higher bandwidth limit. Further each such

SAS server links to the level 2 SAS server that runs in the core network. Every level

2 SAS server is interconnected in a mesh fashion with each other. Such an extensive

mesh could cause significant network traffic in the core network with increasing number

of level 2 SAS servers deployed. They assume that only a few level 2 SAS servers

would be needed in their scheme. Regardless, their scheme still remains a push based

scheme and suffers from limitations of SAP. There still remains a significant delay in

session information being disseminated to remote hosts (albeit much lesser delay

compared to sdr). Their scheme also transmits the complete session details to every

SAS server in the hierarchy on a periodic basis causing unnecessary network traffic.

Administrative burden is increased in this scheme as well, as every level 2 SAS server

must be fed the connection details of every other level 2 SAS servers.

Table 5-5. DHT feature comparison
DHT scheme routing table size average hop count
Chord m O(logN)
Tapestry b x logbN O(logb N
Pastry (2b 1) x [log2bN] + 2 x L l[og2bN]
Kademlia 0 < i < 160, k-bucket list with 2i to 2i+1 distance [/og2 N1
CAN 2 x d (d/4) x (nl/d)
mDNS 2 + fan-out factor (k) O(logk n)

mDNS DHT deviates from other DHT designs in many ways. Let us read some of

the differences now:

* mDNS overlay mirrors domain hierarchy whereas other schemes employ flat
structuring where generally overlay node IDs are assigned randomly in a circular
hash space ring virtual structure.
Generally, in DHT based overlays, participating nodes that are neighbors in the
nodelD space may be very far apart in actual network distances. The neighboring
nodes in mDNS, because the overlay mirrors domain hierarchy in the Internet, are
likely to be not very far in actual network distances.
In a typical DHT based overlay, all participating nodes assume similar responsibilities
and workloads because the nodes are in a flat arrangement. Even in Kademlia [38]
that employs a binary tree arrangement of nodes, all participating nodes are leaves
in the binary tree. In contrast, in mDNS, the higher up nodes have typically larger
message routing burden compared to leaf nodes. The root node manages the
overall hash space allotment and its subsequent management.

Similar to other DHT overlays that have a constant Relative Delay Penalty (RDP)

factor [69], because of the nature of the mDNS hierarchy where neighboring domains

are more likely to be network neighbors too, we conjecture that mDNS RDP factor

will be within a constant factor of actual network routing path length. Let us compare

the various DHT schemes with respect to their respective routing table sizes, average

routing hop counts, and their logical node placement strategies. Table 5-5 shows the


Among the compared DHT schemes, Chord [47] and Pastry [37] have node

placements in a logical ID space rings, Tapestry [49] [56] has nodes in a graph logical

arrangement, Kademlia [38] constructs a binary prefix tree with nodes as leaves,

CAN [39] arranges participating nodes in a d-dimensional coordinate space, and mDNS


The actual geo-search complexity depends on the length of linked-lists, O(list)

rooted at the tree-leaves in our sparse-matrix representation. Hence the search

complexity can be approximated by -

C x (N x k[I9k 9- L10gk x O(list)) (5-4)

where C is a constant that can vary between 1 and 4 depending on search-criterion's

(read:coordinates) proximity to the grid edges or corners of the target quadrant at tree

height h'. The search complexity can be reduced greatly if we replace leaf linked-lists by

hash-tables and using perfect hashing functions.

5.4.2 Hash-Based Keyword Routing Fairness Analysis

Although its nearly impossible to find out a priori the relative popularity of keywords

used to do session searches, let us, for the time being, assume that every keyword is

likely to be searched equally. Further because routing is done using MD5 hash of these

keywords, the cryptographic nature of the hashing function makes any hash value in the

entire hash space to be routed in equal likelihood. Keeping these assumptions in mind,

let us analyze the hash space distribution among participating mDNS MSD servers and

the search and routing workload analysis on them.

Suppose the root node has k-child domains such that the sum of MSD-designate

count at the root node is N. Let us denote the node count from each child node being

reported to the root node by ni. Thus -
ni = N (5-5)
-i 1

Since MD5-128 hash is used, the keyword hash space that must be distributed among

participating nodes is 2128. As we use prefix routing in mDNS, suppose the significant

bits that are needed to route appropriately be "m". And therefore -

2m > N or m > logN (5-6)


Algorithm 5: Node failure recovery algorithm
1 begin
2 {No parent heartbeat received for 'n' consecutive timeouts }
3 if subscribed to receive parent's communication over PMCAST then
4 {send request to parent to receive communication over unicast }
5 if connection re-established then
6 | {proceed to function normally }
7 end
8 else
9 | {proceed to else section of outer 'if' }
10 end
11 end
12 else
13 {send [tracer] request to the root node }
14 {upon receiving graft details, initiate grafting }
15 {periodically keep pinging original parent node }
16 if original parent comes online later then
17 {detach from the temporary graft location }
18 {resume regular operations }
19 end
20 end
21 end

2.4 Related Work

We have so far seen the construction and maintenance of the tree DHT structure

and using such a structure for aiding in a seamless multicast session search and

discovery by an end user. Let us take a brief look into some of the other competing

multicast search strategies and peer-to-peer DHT schemes.

2.4.1 Multicast Session Search Strategies mbone sdr session directory

Traditionally "sdr" [40] has been used to create as well as broadcast multicast

session information to all parties interested. "sdr" uses SDP and SAP for packaging

and transmitting these multicast session information on a well known globally scoped

multicast channel, ( But'sdr" has numerous limitations.

The bandwidth restrictions enforced on SAP causes significant delays in session

information reaching remote hosts. Also, every receiver must constantly listen to periodic

announcements on and "sdr" clients multicasts the session details for



1-1 IANA assigned multicast addresses (few examples) .

3-1 Common DNS record types ........................

4-1 Typical cache structure ..........................

5-1 Partial simulation data for scenario 1 hierarchy for permutation list [10,
1,2 ,7 ,8 ,9 ,3 ] . .

5-2 Partial simulation data for scenario 2 hierarchy for permutation list [10,
1,2 ,7 ,8 ,9 ,3 ] . .

5-3 Partial simulation data for scenario 3 hierarchy for permutation list [10,
1,2 ,7 ,8 ,9 ,3 ] . .

5-4 Latency measurements summary ..... ...

5-5 DHT feature comparison ................ .........


4,5, 6,

4,5, 6,


. 19

. 64

. 75

. 89

. 92

. 93

. 96


failure recovery algorithm. As part of the parent's heartbeat messages, the child node

gets the hash space assigned to the parent node. It uses this knowledge to find a still

alive ancestor node in the higher up hierarchy. In the face of node failures, one might

ask this question, how would one recover the stored records now rendered inaccessible

by the loss of this failed node? The shadow copy stored at a location determined by

bit inverting the keyword hash would allow the end user to locate the inaccessible

records stored at the failed node. Even with the possible hash space reassignments

later (in case the node failure is substantially prolonged), the shadow copy, with high

probability, will exist in the hierarchy. Additionally, each session registration may have to

be refreshed with a set periodicity by the session originators. This would aid a session

registration details to reinserted and be discoverable by end users in case of node

failures and topographical changes in the global hierarchy. This approach is consistent

with recovery approach adopted by several popular DHT schemes [37] [38] [39] but has

not been incorporated in the proof of concept implementation neither in our proposed

IETF RFC [36] yet.

Figure 2-11 shows the sequence of events after a node failure leading to temporary

grafting of the child node at an appropriate ancestor node in the hierarchy. Algorithm 5

describes what happens in a parent node failure situation.

If the hierarchy root domain fails, then each of the children node will have no

temporary 'graft' option. After some period each of them will assume root responsibilities

and the hierarchy will deteriorate into disconnected forest. It is essential to provide

sufficient redundancy at the root level in the form of multiple backup MSD servers

running at any given time to prevent such a scenario from realization. In a rare scenario,

simultaneous URS and MSDd failure can also result in a failed domain even if that

domain has multiple backup MSD servers. Such a scenario should be prevented at the

root level at least.

4.6.8 Self Management

Network topology can change gradually and sometimes rapidly. If an architecture is

capable in doing self-management and resource realignment in the face of infrastructure

changes, the users load can be redistributed and the users experience will degrade

or improve gradually giving a sense of service stability. The 'mDNS' architecture uses

soft-state protocols to keep track of changes in network topology.

4.6.9 Multicast Mode Independence

Currently multicast is in a transitionary state migrating from ASM towards SSM

mode. So any system using multicast for optimized communication must be able

to exist in both ASM and SSM networks. 'mDNS' has that capability. Depending

on the underlying network type, it can subscribe to appropriate groups. In case the

communication is not possible over multicast at all, the components are capable of

switching to IP unicast in order to send/receive important communications.

'mDNS' architectural outline and specifics provided in this dissertation have been

able to achieve most of the goals that were identified early on. Now let us figure out the

quality of goals met and the services provided and identify shortcomings and areas of

improvements in the architecture.

4.7 Looking Back High Level Assessment of the 'mDNS' Service Framework

One of the design goals of 'mDNS' was equitable work load distribution. In the

proposal presented in this dissertation, we have achieved within acceptable limits

equal hash space division among participating domains. But we realize that actual

data distribution stored at each domain will be skewed depending on the popularity

of keywords used to tag the sessions. Also the communication load is not equal. The

nodes that are close to the root of the DHT will have to process more routing messages

through them. The DHT soft-state maintenance algorithm requires some control

messages to be sent between parent and children domains at periodic intervals. The

communication overhead is not a function of the level in the tree but the fan-out factor


IP multicast is a very efficient mechanism for transmission of live data and video

streams to large group of recipients. Compared to unicast, it offers tremendous

bandwidth savings to content providers and reduces traffic in the core network, letting

free the expensive bandwidth for other services. The bandwidth savings become

valuable and noticeable over thin Trans-Atlantic data pipes. From an end-users'

perspective, multicast improves their quality of service perception because instead

of multiple data streams competing for a congested link resource, there exists one data

stream and therefore it gets allotment of higher congested link bandwidth.

Even though multicast has numerous benefits over unicast for data transmission

in various scenarios, its end user demand and network deployment remains sparse.

Unlike unicast, where the source and destination addresses are unique and generally

somewhat stable, multicast addresses are usually assigned for a short term to the

group. The group addressing is typically flat and offers no clue about message

transmission direction to routers. So data forwarding is typically done using RPF

checks and using a shared distribution tree. The construction of shared distribution

tree and source discovery including multicast routing requires the network layer to

implement several complex protocols such as MSDP, BGMP, Cisco-RP, PIM-SM.

This increased complexity in the network layer and therefore, increased network

management complexity acts as a deterrent for the system administrators against native

multicast deployment. Furthermore, lack of a scalable and realtime global multicast

session discovery support in the Internet and lack of usability prevents the end users

from tapping into the benefits of multicast.

With the increasing deployment of IGMP v3 in the network edges, end users have

gained capability to filter and subscribe to specific sources they are interested in. SSM

reduces network layer complexity and thereby increases its acceptance by system


These learned values are updated if situation warrants it and these values help in proper
fault recovery and other functioning of the system.
4.2.2 System Setup in Various Network Environment
Depending on the nature of supported multicast in the network, the 'mDNS' global
hierarchy takes on different configurations. Figure 4-2 shows the communication overlay
in the ASM network scenario. Since CMCAST channel of parent domain is same as
PMCAST channel in the child domain and flow of communication is allowed along both
parent-to-child and child-to parent paths, parents and all children domains join the
common multicast channel for communicating with each other.


A- ~

Figure 4-2. A typical mDNS hierarchy in ASM network

The 'mDNS' structure is capable of operating in a mixed multicast environment as
well. A network domain that supports both ASM and SSM multicast mode of operation
and supports both (S, G) and (*, G) joins as well as deploys any required supporting

Scenario 3 domain 4

@ dom000

Scenario 3 domain 5

@ dom001

Scenario 3 domain 2



2010 Piyush Harsh




























Average hash skew scenario 1 .............

Skew standard deviation scenario 1 .

Average control bandwidth scenario 1 .

Control bandwidth standard deviation scenario 1 .

Average route switches scenario 1 .

Route switch standard deviation scenario 1 .

Average route stabilization time scenario 1 .

Route stabilization time standard deviation scenario 1

Average hash skew scenario 2 .............

Skew standard deviation scenario 2 .

Average control bandwidth scenario 2 .

Control bandwidth standard deviation scenario 2 .

Average route switches scenario 2 .

Route switch standard deviation scenario 2 .

Average route stabilization time scenario 2 .

Route stabilization time standard deviation scenario 2

Average hash skew scenario 3 .............

Skew standard deviation scenario 3 .

Average control bandwidth scenario 3 .

Control bandwidth standard deviation scenario 3 .

Average route switches scenario 3 .

Route switch standard deviation scenario 3 .

Average route stabilization time scenario 3 .

Route stabilization time standard deviation scenario 3

Summary chart for latency experiments .

Range chart for latency experiments .

M edian Latency .. .. .. .. .. .

. 8 8

. 8 8

. 9 0

. 9 0

. 9 0

. 9 0

. 9 1

. 9 1

. 9 1

. 9 1

. 94

. 94

. 94

. 94

. 9 5

. 9 5

. 9 5

. 9 5

. 9 6

. 9 6

. 9 6

. 9 6

. 97

. 97

. 97

. 9 8

. 9 8

The deployment of IPv6 slowly will solve the lack of addresses for IP multicast in

IPv4. A significant issue with IPv4 was the high probability of address clash among

active multicast sessions due to lack of a managed address allocation scheme [9] [10]

[11] [12] [13] [14] [15]. And with SSM slowly replacing ASM model of multicast in the
Internet (or at least we are hoping that it will be the case), the address clash problem will

be taken care of. Lets take a brief look into what is required for multicast data packets to

be routed in the network. Multicast routing

Routing of data packets in multicast is generally done via shared distribution trees.

In contrast to unicast where packet forwarding decision is based on the destination

address, almost every multicast routing protocol makes use of some form of reverse

path forwarding (RPF) [16] check. For example, if the incoming data packet distribution

is done along a source tree (as in the case of SSM), the RPF check algorithm checks

whether the packet arrived on the interface that is along the reverse path to the source

from that router. If yes, then the packet is forwarded on all interfaces along which one

of more recipient host(s) can be reached. If not, the packet fails the RPF check and is


The RPF check table could be built using a separate multicast reachability table

as is done in the case of Distance Vector Multicast Routing Protocol (DVMRP) or could

be done using IP unicast forwarding table as in done in Protocol Independent Multicast


Many competing intra-network multicast routing protocols exist in the networks

today including DVMRP [17] [18], PIM (both dense mode (DM) [19] and sparse mode

(SM) [20]), Multicast Open Shortest Path First (MOSPF) [21], Core Based Trees (CBT)
[22], etc. to name a few. Because PIM uses unicast forwarding tables for RPF check,
it is easier to implement and is rapidly gaining acceptance among internet vendors.

PIM-SM has emerged as the protocol of choice for implementing multicast routing in

* CMCAST multicast address for a domain to send communication to its children
IGMP version support
URL string of the parent domain, if no parent URL exists, then set to 'void'.

URS will be described in more details in Chapter 3. The MSD servers in each domain

retrieves the above parameters during bootstrap phase. The values of the parameters

allow the MSD server to join an existing hierarchy. Multiple MSD servers can be started

in the same domain to improve local redundancy. Every MSD server is equipped

to execute a common leader election protocol. The elected leader becomes the

'designated' MSD server. Such a server in a domain will be referred to as MSDd

Figure 2-2. A general domain hierarchy

Each domain reports the count of domains in the subtree rooted at self to its parent

domain. Figure 2-2 shows a general domain hierarchy. The numbers listed next of each

5-31 Average Latency .. .. .. .. .. .. .. .. 98

5-32 Average of weighted scores scenario 1 .... 99

5-33 Standard deviation of weighted scores- scenario 1 . 99

5-34 Average of weighted scores scenario 2 .... 99

5-35 Standard deviation of weighted scores- scenario 2 . 99

5-36 Average of weighted scores scenario 3 ... 100

5-37 Standard deviation of weighted scores- scenario 3 ... 100

Query Interval Code" and QRV is the "Querier's Robustness Variable" value. The "Group

Address" field is set to 0 to send a general query or is set to a specific multicast IP

address if a group specific query or a group and source specific query has to be

sent. If "Type" field is not Ox11 or 0x22, the packet must be processed for backward

compatibility depending on whether these values are present instead.

1. 0x12: Version 1 Membership Report
2. Ov16: Version 2 Membership Report
3. 0x17: Version 2 Leave Group

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
I Type = 0x22 | Reserved I Checksum
+-+-++--+-+-+-+-+-+-+-+-+-+-+-+-I-+-+--+-+-+-+-+- +- +-+
I Reserved | Number of Group Records (M)
-+-+-+-+-+-+-+++--+-+--+-+-+--+-+-- +++++ +- +-+-++-++-+

Group Record [1]


Group Record [2]


+-+-+-+-+-+-+-+-+-+-+-+-++--+-+-+-+--+-+--+- +--+-+-+-+-+--+-+

Group Record [M]

+-+-+-+-+-+-+-+-+--+-+-+--+ + -+-+-+-+-+-+-+-+-- ++- +-+ +-+-+-+-+-+-+-+--+-+

Figure 1-6. IGMP v3 packet format membership report

Using these IGMP query and IGMP report messages, the hosts and the routers

are able to determine on what neighboring interfaces they should forward the multicast

packets to and what all interfaces must be removed from the multicast forwarding list for

any given multicast address. Hosts can also take initiative and notify the neighboring


The simulations are run using custom implemented DNS tool. This tool takes in

domain-setup file that contains all necessary parameters such as PMCAST, CMCAST,

URS IP and port values and other associated parameters that allows the simulation of

a connected hierarchy of domains. Here are the 'Virtual DNS Tool' configuration files

for each domain in the three connected hierarchies used for experimentation. The DNS

settings files should have an extension '.mds'. The domain numbers corresponds to the

numbers as shown in the hierarchy diagram Figure 5.3.

@ dom00
@ domOl
@ dom02

Scenario 1 domain 10



automatically commissioned if workload increases at a particular manager has not

been specified. The authors have specified several ongoing and future research areas

pertaining to IDG and once they are incorporated, it could turn out to be a viable

alternative to above mentioned approaches. Our proposal goes several steps forward

and even allows users to perform searches based on geo-specific criteria and allows

them to bookmark their favorite sessions just like one can bookmark a popular webpage

these days.

2.4.2 Peer-2-peer DHT Schemes

Distributed Hash Table (DHT) schemes allow a faster and structured lookup

of resources in a distributed peer-to-peer network. Some of the more popular DHT

schemes are based on circular arrangement of host nodes ex. chord [47], pastry [37]

and bamboo [48], distributed mesh arrangements as in tapestry [49], hierarchical

structure such as Kademlia [38] and spatial DHT as in CAN [39] where routing is

done using cartesian coordinates. The P2P DHT schemes allow for scalability, self

organization, and fault tolerance. Yet they may suffer from issues resulting from churn

[50] and non-transitivity [51] in connections among participating nodes. Researchers

have even proposed unstructured DHT [52] overlays that provide benefits of structured

DHTs. Let us now briefly look at some of these DHT schemes. In Chapter 5 we will

discuss reasons to develop our own DHT scheme and explain why we chose not to use

current DHT schemes for mDNS architecture. Chord

Chord [47] is a distributed P2P architecture that allows users to store keys and

associated data in an overlay. Given a key it provides service that maps that key onto

an existing node in the overlay. Each node maintains information about routing keys to

appropriate node in its local finger table. Finger tables are constructed based on local

interaction among participating nodes and any node need not know the global state

of the overall system. The routing table (finger table) size in a stable state grows at

lower level DNS servers. Mostly in name resolution the root servers are bypassed
entirely. Failure at root level should not impact 'mDNS' resolution process terribly.
scenario 2 failure at the TLDs: failure in the TLD could cause problems in the
name resolution process unless the authoritative DNS details are cached at lower
level DNS servers, in which case the TLD can be bypassed. The caching at DNS
servers depends on the frequency of visits to a particular domain. Typically a DNS
cache entry is set to expire in 48 hours. Therefore unless a particular domain
is visited often the entry would not be present in the local DNS server and thus
'mDNS' name resolution process will fail.
scenario 3 failure along the resolution path: failure in the DNS server along
the resolution path would also disrupt the name resolution process. But typically
many domains maintain a primary and a secondary DNS detail so an alternate
resolution path can be used. If the path converges before the failed link in the
resolution chain, the overall resolution process will fail as well.
scenario 4 failure of the authoritative DNS server: this will most likely lead
to failure as the IP mapping of the URS is maintained as an A record at the
authoritative DNS server.

The DNS failures are generally very rare. In the past there were some TLD poisoning

attacks but they were largely ineffective because of caching and replication of the TLD

DNS infrastructure.

4.5.2 Failure of URS

The immediate implication of URS failure at a particular domain is that all the

sessions 'mDNS' URLs registered at that domain's URS will become inaccessible. The

resolution process that should ideally resolve to multicast session parameters so that the

end user is able to start accessing the multicast stream will not be achieved. But such a

failure will not impact resolution of any multicast session registered in any other 'mDNS'

domain. URS server failures can be tackled easily by providing replication. That can

also serve as a load balancing strategy by using the IP rotation feature of administrative

DNS server.

URS failure can also affect normal 'mDNS' functionality in another way. Since

URS maintains parent domain's URL string and therefore is able to query the parent's

URS for details such as IP address and port details of the MSDd server in the parent's

domain. These details might be needed in case the MSDd server is unable to receive

@ dom0201
@ dom0200
@ dom0202

Scenario 1 domain 6


Scenario 1 domain 7



servers if present are there for fault tolerance. One out of many possible MSDs is

selected as designated MSD server of that domain. Communication among MSD

servers running in the same domain depends on the kind of multicast supported in the

network. If ASM mode is supported then intra-domain MSD servers communicate over

MSD-LOCAL-MCAST administratively scoped channel. This channel is also assumed

well known and possibly IANA assigned. If only SSM multicast mode is supported, the

communication among intra-domain MSD servers revert to unicast to URS as the send

channel and relayed back using SSM channel to all MSD servers in that domain. The

channel would then become (URS-IP, MSD-LOCAL-MCAST) [using (S,G) notation].

As mentioned in Chapter 3 the URS acts as bootstrapping mechanism for MSD

servers. The system administrator needs to configure PMCAST, CMCAST, network's

IGMP support, and the parent's domain URL at the time of URS startup.

PMCAST is the globally scoped multicast group on which this domain receives

communication from its parent domain. If ASM mode is supported then any communication

to the parent can be sent using this channel otherwise the communication upstream

must be done through unicast. PMCAST value of a particular domain is same as

CMCAST value in the parent domain.

CMCAST is the globally scoped multicast group over which a domain sends

communication to its children domains. If ASM mode is supported then the child node

can communicate back to this domain over the same group otherwise they must use

unicast to communicate upstream. CMCAST value of a particular domain is same as

PMCAST value in any child domain.

Apart from hard-coded configuration parameters, URS also maintains several soft

state parameters. Important among them are -

* IP address of the parent's domain MSDd server
* IP address of MSDd server in self-domain

Please enler approximate
enough geographical details
about your mulhicast stream II
could be physical address
where your conleni source
server is located or it could be
location details about the
nature of content you are

In case you are not sure about
the street address. please
enter any commonly used
street address for example
downtown. NE Main St. or
name of some prominent
landmark example University
of Florida. Central Hospital

i. : : j ,rl, ~ ,jI,'. :,i,:r. STATUS:

Figure 2-8. Screenshot session registration tool

W Enhanced DS Session Mana
File Help

I Applic"atn (i t I nA

protocols such as MSDP and RP-discovery and can act as glue between networks that

support only SSM or only ASM mode of operations. Such a network that can act as glue

is said to be operating in a hybrid environment. Figure 4-3 shows a scenario where two

hybrid multicast networks are shown as connecting disparate multicast networks.

\ lk

Fiu' e\
r '

Figure 4-3. A mDNS hierarchy in mixed network operation mode

The domain's URS helps decide what sort of multicast mode the MSD server will

operate in. The inclusion of parent's domain URL string allows the URS to contact the

parent domain's URS and get relevant network support information. In case where the

multicast communication between parent and child is not possible using multicast then

after a preset communication timeout (soft-state refresh), a unicast link is setup between

the two domains. Thus in scenarios where no hybrid network type exists and there is no

consistent network support for multicast, the communication hierarchy will degenerate

gracefully to unicast links between parent and children domains. Let us now see in some

detail how caching is used in 'mDNS'.

Significant Bits; 4 [NODE A]
Start End Next Node
0 0 self
10 1 5 B
6 7 E
15 F
8* parent

Significant Bits: 4 [NODE G]
4 Start End Next Node
9 9 self
10 11 H
12 13 I
4 15 J

Figure 2-3. Example routing table structure

node now does not lie within its assigned hash space, that record is migrated to the

correct newer destination using the hash-routing table maintained at each node.

Frequent addition of domains in the overall hierarchy can lead to frequent hash space

reassignments and frequent migration of records. The hierarchy routing infrastructure

and records location stability is improved using the domain-count reporting strategy

described in 2.2.4. Every node also knows the root node's unicast connection details

which they may use in case of a dead parent scenario in order to locate the appropriate

grafting point in the tree. The strategy is described in later chapters. Removal of a domain

Removal of a domain from the hierarchy would result in the parent node not

receiving the periodic heartbeat messages from that node. After a predetermined

using the inverted hash value to another destination in the overall hierarchy. Figure 2-5

shows an example case where a record and its shadow copy is routed through the DHT

structure to appropriate target domains. The routing is done using the keyword hash

of the record. The hash value shown in the figure is arbitrary and is provided for clarity

only. Routing is done based on first 4 bits of the hash in the figure.

REC Signifian BIts: 4 [NODE A]
tart End Next Node
A a 0 self
SHADOW 10 1 5 8
a 7 E
8 15 F
S* parent

(iE ,


S:Significant Bits: 4 [NODE G]
4 Slart End Next Node
S C D G self
I\ 10 11 H




-12 '- 13 I
4 15 J
t parent

Figure 2-5. DHT record insertion example Deletion of a session record

Explicit removal of a session record is not permitted in the architecture. Every

record has a set expiration time which is interpreted as the 'number of clock ticks'

into future from the time the record was inserted. This provides protection from local

constructs a k-ary tree hierarchy of participating domains where 'k' is the number of

typical children domains attached at a node.

In Table 5-5, 'm' represents the number of bits representing a chord node ID. 'N'

represents the number of participating nodes under chord and pastry, but represents the

size of namespace with base 'b'. 'n' denotes the actual number of participating nodes

for kademlia, CAN, and mDNS entries. For pastry, 'b' represents the number of bits

used to represent the base of the node ID representation, b = 4 signifying a base 16

representation. For pastry, 'I' denotes the size of leaf set and proximity neighbors list.

5.5 Conclusion

The 'mDNS' framework we described in this dissertation allows for an easy

discovery of multicast session and improved usability due to URLs assignment capability

offered by the architecture. The session discovery is based upon the distributed tree

DHT structure that depends on the internal parameters a and /. In this chapter we

presented our simulation scheme and described the setup in detail. We performed

experiments with range of values of a and / to find out the range of values of these

parameters for better stability and better overall system performance.

We presented analytical assessment of session search complexity when using

Geo-DB database. We also presented arguments on the fairness claim made in the

dissertation with respect to the participating domains in the overall 'mDNS' hierarchy.

We presented search latency experiment results and their interpretations. We presented

comparative analysis among various popular P2P architecture and the architecture

presented in this dissertation.








node in the hierarchy denotes the domain count that particular domain sends to its

parent. Soon the root node finds out the total count and the count distribution along

direction towards each of its children domains. This information allows the root node to

partition the overall hash-space into equal (almost equal, as the range is composed of

discrete values and not a continuous line) chunks to be assigned to all the participating

domains. The hash space allotment details travel from the root node to all the leaves

and all intermediate nodes. As the space allotment information trickles down through the

tree, each intermediate node makes a decision on how to further subdivide the space

among itself and the children domains. Depending on the total domain count, the hash

space is divided using only the 'n' most significant bits (MSB) of the entire hash-space

where 2n > Count. For example, if there are 8 domains in the hierarchy, and MD5

hash algorithm is used to generate the hash value (128 bits), only the first 3 MSB bits

are used for hash space division. Algorithm 1 shows the hash space division process

performed at the root node.

Very similar algorithm to algorithm 1 is performed at each intermediate node in

the hierarchy when they have to allot subsapces to their children. The only change is

that MSB is set to the value that is received from the parent node and is not computed,

and START and END values are set to the hash range start and end values received.

As and when the hash distribution is propagated down, the nodes update their routing

tables that allow a node to route a particular hash value towards the target domain in the

hierarchy. Figure 2-3 shows the example of hash space assignment to domains A and G

in the hierarchy as well as the DHT routing table construction at the two nodes, node A

and node G using algorithm 1.

Now that we have seen how the DHT hierarchy is created in the proposed

architecture, let us discuss the operations permitted in the DHT hierarchy. Typically,

any P2P DHT scheme allows insertion and removal of participating peers as well as

addition and deletion of data records. Each of these 4 operations are discussed next.

Since a 1 x 1 grid at equator represents an area of 111.3x110.9 km2, it might be

necessary to further subdivide the area into smaller zones. The grid subdivision or node

branching factor "k" determines how a larger grid area is subdivided. The depth of tree

and choice of "k" depends on the final areal resolution desired. For instance, if an arial

resolution of at most 5 x 5 km2 is desired, and let us say that the branching factor is 2,

i.e. k = 2, then a tree with height 5 would result in an areal resolution of 3.48 x 3.47 km2

at equatorial plane. In general, areal resolution at depth "n" for branching factor "k" is

governed by these equations below:

.9km (2-2)

X a4 cos (s)2 + b4sin (0)2 1
-- x cos x x -km (2-3)
1800 (acos )2 + (b sin )2 k"
Equation (2-2) governs the north-south resolution at tree depth "n" and equation (2-3)

governs the east-west resolution at same depth at latitude 0.

Any session that gets stored at either "global" or "local" databases also keeps

a corresponding geo-reference in the "geo-tagged" database. These references are

maintained at correct grid location in the level 0 structure and at correct leaf linked-list

in the tree rooted at corresponding level 0 grid position. The "garbage-collector"

thread while removing stale sessions from "global" and "local" databases removes the

corresponding reference from "geo-tagged" database as well. Maintaining this additional

structure allows new service paradigms to be supported that was previously not

possible. Few such services such as real-time 'iReporting' and support for geo-specific

and proximity search criteria have already been mentioned earlier. Next let us look at the

key algorithms needed to support a seamless user search experience and discuss the

modalities of the supported operation.


1.1 IP Multicast

Network traffic in the Internet can be broadly classified into connection oriented

or connectionless stream. The three communication paradigms that internet protocol

(IP) supports are unicast, anycast [2], and multicast [3] [4]. Unicast allows point to point

communication between networked hosts. In IP unicast, the source and destination

addresses identify unique nodes in the global network. In anycast model, associated

with a fixed anycast address, there could be more than one host associated. The

communication paradigm that it supports is one to at least one model. The network

routers tries to deliver the data to at least one of the hosts associated with that anycast

address. IP multicast lies at the other end of the spectrum. It allows for one to many

(SSM) [5] or many to many (ASM) [6] transmission paradigms.

The multicast transmission paradigm is partly determined by the distribution tree

that the core network uses for data distribution among interested recipient hosts.

Rendezvous Point (RP) [7] based distribution tree generally allows Any Source Multicast

(ASM) to operate. In RP based distribution tree, a network host interested in receiving

group communication joins the distribution tree at one of the nearest leaf nodes. The

source sends data to the RP nodes and the data is disseminated down to all the

interested recipients. In ASM model, the data source needs to locate the RP node in

order to send group data. The sender is not required to join the multicast group in order

to send the data. Since many hosts can send data to the same multicast group by just

transmitting data through the RP node, hence the name "Any Source Multicast".

In Source Specific Multicast (SSM) which is sometimes also referred to as single

source multicast, the data distribution tree is rooted at the data source. Now, in addition

to finding out the multicast group that a recipient node is interested in joining, it must

also find out the source node IP address in order to join the correct data distribution tree.

AverageRouteStablzation Time i Seconds (X Beta, Y Apha, Z- Time) RouteStabilzalon Tme Standard Deviaton m Seconds (X Beta. Y Apha Z STDEV)
"set1diatxt"u 1 2 9 "setdatatxt" 12:10

Figure 5-10. Average route stabilization Figure 5-11. Route stabilization time
562 2
45ti 1- 1 1
400 1

22 02

Figure 5-10. Average route stabilization Figure 5-11. Route stabilization time
time scenario 1 standard deviation scenario 1

for different values of a & P among three experimental runs for hierarchy 1. Figure 5-10

shows the routing table stabilization time for different values of a and / for the same

domain hierarchy structure.

Table 5-2 shows the partial data values for experiments done on domain hierarchy

setup type 2 and with domain starting order permutation list as [10, 4, 5, 6, 1, 2, 7, 8, 9,

3] and inter-domain startup delay values as [5, 5, 5, 10, 30, 600, 5, 5, 300, 30]. Table 5-3

shows the partial data values for domain hierarchy scenario 3 and with same domain

startup order and delay parameters as before.

AveageHasH Skew(X Beta.Y Alpha, Z Skew) AerageHash Skew StandadDeviation (X Beta. Y Alpha, Z STDEV)
"set2datatxt" u 1 23 "set2datatxt" u 1 2 4

oZ o

Fiur 2 g h s w s F:
0 s i07

1 -1
2 2

Figure 5-12. Average hash skew scenario Figure 5-13. Skew standard deviation -
2 scenario 2

and stores it in the 'Global Records Database'. Figure 2-9 shows a simple registration

scenario with session scope set as 'global'.

Local Records DB

REC[keyl i
REC[key2] 'J
REC.k.y rnw '*'


Figure 2-9. Session registration Session search

"mDNS" architecture supports global as well as domain specific search. In a domain

specific search, the end user uses the 'mDNS' URL for the target domain to pass on the

search query to the target MSDd server in that domain. The domain runs the search

query only on the 'Local Session Records' database at that site and returns only globally

scoped multicast sessions, out of candidate sessions, back to the requesting end user.

The domain specific search support uses services from URS for 'mDNS' URL name

resolution. More details on URS are provided in Chapter 3.

The other kind of search support that 'mDNS' supports is the global search function.

The end user presents the search query to the local MSDd server. Depending on the

search criteria, the 'mDNS' hierarchy routes and processes the query and the candidate

session details are sent back to the requesting end user. In order to reduce processing

load on the servers, the search aggregation is left for the end user's search tool to

perform. The details on this type of search is given next.

mDNS Globa .Hierarchy

[48] S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. Shenker, I. Stoica,
and H. Yu, "OpenDHT: a public DHT service and its uses," in SIGCOMM '05:
Proceedings of the 2005 conference on Applications, technologies, architectures,
and protocols for computer communications. New York, NY, USA: ACM, 2005, pp.

[49] B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. Kubiatowicz,
"Tapestry: a resilient global-scale overlay for service deployment," Selected Areas in
Communications, IEEE Journal on, vol. 22, no. 1, pp. 41-53, Jan. 2004.

[50] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, "Handling churn in a DHT,"
in ATEC '04: Proceedings of the annual conference on USENIX Annual Technical
Conference. Berkeley, CA, USA: USENIX Association, 2004, pp. 10-10.

[51] Freedman, M. J., Lakshminarayanan, Karthik, Rhea, Sean, and I. Stoica,
"Non-transitive connectivity and DHTs," WORLDS'05: Proceedings of the 2nd
conference on Real, Large Distributed Systems, pp. 55-60, 2005.

[52] K. P. N. Puttaswamy and B. Y. Zhao, "A case for unstructured distributed hash
tables," in Proc. of Global Internet Symposium, Anchorage, AK, May 2007.

[53] S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. Shenker, I. Stoica,
and H. Yu, "OpenDHT: a public DHT service and its uses," in SIGCOMM '05:
Proceedings of the 2005 conference on Applications, technologies, architectures,
and protocols for computer communications. New York, NY, USA: ACM, 2005, pp.

[54] T. Roscoe and S. Hand, "Palimpsest: soft-capacity storage for planetary-scale
services," in HOTOS'03: Proceedings of the 9th conference on Hot Topics in
Operating Systems. Berkeley, CA, USA: USENIX Association, 2003, pp. 22-22.

[55] B. K. Sylvia, S. R. S. Rhea, and S. Shenker, "Spurring adoption of DHTs with
OpenHash, a public DHT service," in IPTPS, 2004.

[56] B. Y Zhao, J. D. Kubiatowicz, and A. D. Joseph, "Tapestry: An infrastructure for
fault-tolerant wide-area location and," University of California at Berkeley, Berkeley,
CA, USA, Tech. Rep., 2001.

[57] C. G. Plaxton, R. Rajaraman, and A. W. Richa, "Accessing nearby copies of
replicated objects in a distributed environment," in SPAA '97: Proceedings of the
ninth annual ACM symposium on Parallel algorithms and architectures. New York,
NY, USA: ACM, 1997, pp. 311-320.

[58] J. Kubiatowicz, D. Bindel, Y Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi,
S. Rhea, H. Weatherspoon, C. Wells, and B. Zhao, "OceanStore: an architecture
for global-scale persistent storage," in ASPLOS-IX: Proceedings of the ninth
international conference on Architectural support for programming languages and
operating systems. New York, NY, USA: ACM, 2000, pp. 190-201.

RouteStabilzalon Time Standard Deviatlon I Secnds (X Beta. Y Alpha, Z STDEV)
"et2datatxt" 1 2.10

Figure 5-18. Average route stabilization

time scenario 2

Average Hast Skew (X Beta.Y Alpha. Z Skew)

"setMdatatxt" u 1 2 3


0 1 2
0 06

Figure 5-19. Route stabilization time

standard deviation scenario 2

Hastl Skew Stadard Delation (X Beta. Y Alpha. Z STDEV)

"set3dataxt" u 1 2 4

3 5e- 17
Z 2e-017
1 5e-017

Figure 5-20. Average hash skew scenario Figure 5-21. Skew standard deviation -

3 scenario 3

5.3.1 Latency Experiment Results

We performed latency measurement experiments with 1 to 5 domains using

arrangement shown in hierarchy scenario 3 as shown in Figure 5-3. The data values

represented in Table 5-4 are in milliseconds.

Figure 5-28 shows all the parameters represented as horizontal bars. X axis

denotes the time in milliseconds. Figure 5-29 shows the maximum, minimum, and

average latency values for experiments conducted with domains ranging in numbers

from 1 through 5. X axis shows number of domains and y-axis shows time in milliseconds.

AverageRouteStablzation Time in Seconds (X Beta, Y Apha, Z- Time)
"set2datatxt 1 2 9

z 400


e ~

'~YJJ'~yX ~~ ~~
Y r~i~

communication from the parent domain over multicast and as a fall-back option would try

to switch over to IP unicast channel with the parent domain. If the domain's URS fails,

and the MSDd fails after URS failure, the backup MSD server would try to assume the

task of the MSDd and would not be able to establish unicast channel with the parent

domain. The new MSDd server would soft state timeout and would try to initiate 'mDNS'

domain failure algorithm to find a suitable grafting location with some ancestor higher up

in the hierarchy tree.

4.5.3 Failure of MSD Server

Failure of MSD server can impact a normal 'mDNS' operation if there are no backup

MSD servers to take up the responsibility of the failed MSDd server. If that happens then

all children domains will soft timeout and would initiate domain failure recovery algorithm

to find a suitable grafting point at some ancestor domain higher up in the hierarchy.

Failure of an MSDd server would not affect the 'mDNS' URL resolution as long as

the domain's URS is operational. The stored session records in the MSD database

will become inaccessible, but globally scoped multicast sessions records' would still

(likely) remain searchable as a shadow copy is saved at another location in the overall

hierarchy. All the administratively scoped multicast session records would become

inaccessible through search to end users within that failed 'mDNS' domain.

4.6 Goals Achieved

In Chapter 2 we gave a list of design goals that the service architecture proposed in

this dissertation, intends to achieve. Let us see how far this has been achieved.

4.6.1 Global Scalability and Distributed Design

Many of the earlier proposals for multicast session including 'sdr' had a flat

structure. The session records detail were being propagated to every sdr client active in

the Internet. This worked fine when overall number of sessions were small. If multicast

has to gain the same level of acceptance as unicast, the number of multicast sessions

will increase exponentially in which case sdr and similar proposals will fail to scale.


5.1 Introduction

In this chapter we present some experiments that we performed with the 'mDNS'

architecture. Our simulation strategy will be described and we would justify our choice

of that strategy. A detailed simulation results in raw data format as well as graphical

interpretation of the data which will be presented. Then we will analyze and compare the

presented scheme with other DHT schemes along relevant lines. Message complexity

and workload analysis at various levels in the DHT tree will also be presented.

5.2 Simulation Environment and Strategy Description

In order to test various system parameters and performance benchmarks we

developed a simulation strategy that allowed to run multiple instances of 'mDNS'

software in a simulated domain hierarchy on a single host machine. We developed

a simulator application that comprised of a virtual DNS server implementation and

interfacing of the virtual DNS and actual MSD and URS implementation of the 'mDNS'

components. The unmanned simulation controller developed for this purpose took the

domain startup order sequence and the delay value list. It also took the starting and

ending values for a and / parameters that govern the DHT stability algorithm mentioned

in Chapter 2. Figure 5-1 shows the screen shot of the auto simulator program. The

auto simulator program started the required number of virtual DNS applications with

appropriate configuration parameters and started the URS and MSD server for each

virtual DNS server instance thus creating a simulated number of 'mDNS' domains.

The virtual DNS server parameters were set in a way to link the domains appropriately

according to the simulation domain hierarchy scheme. The virtual DNS software was

capable of domain URL translation in an iterative manner. It also supported basic

protocol handling capabilities that allowed other programs to query certain simulation

parameters over TCP/IP sockets.

Full Text






Iwouldliketoextendmygratitudetomyadviser,Dr.RichardNewman,whohasbeenmoreofafathergureandafriendformethanjustanadviser.HisthoughtsonlifeandthenumerousdiscussionsIhavehadwithhimoverthenumberofyearsonalmosteverythingunderthesunhashelpedmealotbecomeapersonthatIamtoday.SpecialthankstoDr.RandyChowwhoguidedmeandtooktimeoutofhisverybusyschedulewhenDr.Newmantookasabbaticalbreak.Hismeticulousapproachtoscienticquestandhisknowledgeofhowthesystemworkswasaneyeopener.IwouldalsoliketothankallmyfriendsthatImadeovertheperiodofmystayatUniversityofFlorida,includingPioSaqui,JennySaqui,InKwanYu,MahendraKumarandothers(youknowwhoyouare)forkeepingmesaneandgrounded.Allofyouhavebeenaverypleasantdistraction.Youallwillalwaysbeinmyheartandmind.IwouldliketothankallofCISEofcestaffespeciallyMr.JohnBowersfortakingcareofadministrativedetailsconcerningmyenrollmentandmakingsurethingsproceedsmoothlyforme.IwouldliketothankCISEadministratorswithwhomIhadnumerousdiscussionsonintricaciesofmanagingalargecomputernetwork.NotableamongthemareAlexM.ThompsonandDanEicher.LastlyIwouldliketoacknowledgeUFOfceofResearch,ACM,CollegeofEngineering,andUFStudentGovernmentforprovidingmewithnumeroustravelgrantsforattendingconferencesheldallovertheworld. 4


page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 12 CHAPTER 1GENERALINTRODUCTION ............................ 14 1.1IPMulticast ................................... 14 1.1.1WhyMulticast? ............................. 15 1.1.2RequirementsforEnabling/UsingMulticast .............. 18 ..................... 18 ....................... 20 .... 21 .............. 24 ........... 25 1.2WhatThisDissertationTriestoSolve? .................... 26 1.3Conclusion ................................... 27 2TOWARDSEAMLESSMULTICASTSESSIONDISCOVERY .......... 28 2.1DesignGoals .................................. 28 2.2DistributedHashTable ............................. 29 2.2.1RecordsStructure ........................... 29 2.2.2DHTHierarchyConstruction ...................... 31 2.2.3DHTOperations ............................ 34 ..................... 34 .................... 35 .................. 36 ................ 37 2.2.4DHTStability .............................. 38 2.3SupportingMulticastSessionDiscovery ................... 39 2.3.1DatabaseDesign ............................ 39 .................. 39 ................... 41 .................... 41 2.3.2AssociatedAlgorithms ......................... 44 ..................... 44 ........................ 46 ........... 49 5


.................................. 52 2.4.1MulticastSessionSearchStrategies ................. 52 ............... 52 ..... 53 .... 54 ............................ 54 ....................... 55 ....... 55 .............. 55 2.4.2Peer-2-peerDHTSchemes ...................... 56 ............................. 56 ........................... 57 ........................... 57 ............................. 58 ........................... 58 ...... 59 2.5Conclusion ................................... 59 3TACKLINGUSABILITY ............................... 61 3.1IPUnicastvsMulticast ............................. 61 3.2DomainNameService ............................. 62 3.2.1DNSHierarchy ............................. 62 3.2.2DNSNameResolution ......................... 63 3.2.3DNSRecords .............................. 63 3.3URLRegistrationServer ............................ 64 3.3.1URSInternals .............................. 65 3.3.2mDNSNameResolution ........................ 66 3.3.3AdditionalUsage ............................ 67 3.4Conclusion ................................... 68 4BRINGINGUSABILITYANDSESSIONDISCOVERYTOGETHER ....... 69 4.1RevisitingObjectives .............................. 69 4.2Integrating`mDNS'DHTandURLScheme ................. 70 4.2.1ACompletePicture ........................... 70 4.2.2SystemSetupinVariousNetworkEnvironment ........... 72 4.3UseofCaching ................................. 74 4.4DomainSpecicSearch ............................ 76 4.5ManagingFaults ................................ 76 4.5.1FailureinPortionsofDNSInfrastructure ............... 76 4.5.2FailureofURS ............................. 77 4.5.3FailureofMSDServer ......................... 78 4.6GoalsAchieved ................................. 78 4.6.1GlobalScalabilityandDistributedDesign ............... 78 6


............. 79 4.6.3RealTimeSessionDiscoverability .................. 79 4.6.4AbilitytoPerformaMulti-ParameterSearch ............. 79 4.6.5FairnessinWorkloadDistribution ................... 80 4.6.6Plug-n-PlayDesignWithLowSystemAdministratorOverhead ... 80 4.6.7PartialandPhasedDeployment .................... 80 4.6.8SelfManagement ............................ 81 4.6.9MulticastModeIndependence ..................... 81 4.7LookingBack-HighLevelAssessmentofthe`mDNS'ServiceFramework 81 4.8Conclusion ................................... 82 5ARCHITECTUREVALIDATION:SIMULATIONANDANALYSIS ......... 83 5.1Introduction ................................... 83 5.2SimulationEnvironmentandStrategyDescription .............. 83 5.2.1StartingtheSimulation ......................... 85 5.2.2Validity .................................. 86 5.2.3SimulationDomainHierarchySetup ................. 86 5.3SimulationResults ............................... 87 5.3.1LatencyExperimentResults ...................... 95 5.4QualitativeAnalysisandComparison ..................... 101 5.4.1Geo-TaggedDatabase-ComplexityAnalysis ............ 101 5.4.2Hash-BasedKeywordRouting-FairnessAnalysis ......... 103 5.4.3AComparisonwithOtherDHTSchemes ............... 105 5.5Conclusion ................................... 107 6CONCLUDINGREMARKS ............................. 108 APPENDIX:SIMULATIONCONFIGURATIONPARAMETERS ............. 113 REFERENCES ....................................... 127 BIOGRAPHICALSKETCH ................................ 134 7


Table page 1-1IANAassignedmulticastaddresses(fewexamples) ............... 19 3-1CommonDNSrecordtypes ............................. 64 4-1Typicalcachestructure ............................... 75 5-1Partialsimulationdataforscenario1hierarchyforpermutationlist[10,4,5,6,1,2,7,8,9,3] .................................... 89 5-2Partialsimulationdataforscenario2hierarchyforpermutationlist[10,4,5,6,1,2,7,8,9,3] .................................... 92 5-3Partialsimulationdataforscenario3hierarchyforpermutationlist[10,4,5,6,1,2,7,8,9,3] .................................... 93 5-4Latencymeasurementssummary .......................... 96 5-5DHTfeaturecomparison ............................... 106 8


Figure page 1-1Datatransmissioninunicastvmulticast ...................... 15 1-2Perceiveddatarateinunicastvmulticast ..................... 16 1-3Bandwidthrequirementsvsnumberofrecipientsinunicastandmulticast ... 17 1-4MulticastaddressformatinIPv6 .......................... 19 1-5IGMPv3packetformat-membershipquery .................... 22 1-6IGMPv3packetformat-membershipreport ................... 23 2-1Localandglobalsessionrecordsstructure .................... 30 2-2Ageneraldomainhierarchy ............................. 32 2-3Exampleroutingtablestructure ........................... 35 2-4StepsinDHTdomainaddition ........................... 36 2-5DHTrecordinsertionexample ........................... 37 2-6Globalsessionsdatabasedesign .......................... 40 2-7Geo-taggeddatabasedesign ............................ 42 2-8Screenshot-sessionregistrationtool ....................... 45 2-9Sessionregistration ................................. 46 2-10Sessionsearch .................................... 47 2-11Parentnodefailurerecoverystrategy ........................ 51 3-1LocationandnamesofDNSrootservers[source:ICANN] ............ 63 3-2Typicalstepsin`mDNS'URInameresolution ................... 67 4-1AtypicalmDNSdomaincomponents ........................ 70 4-2AtypicalmDNShierarchyinASMnetwork ..................... 72 4-3AmDNShierarchyinmixednetworkoperationmode ............... 73 5-1Screenshot-mDNSautosimulatorprogram .................... 84 5-2Screenshot-mDNSlatencymeasurementtool .................. 85 5-3Variousnetworktopologieschosenforsimulation ................. 87 9


.......................... 88 5-5Skewstandarddeviation-scenario1 ........................ 88 5-6Averagecontrolbandwidth-scenario1 ...................... 90 5-7Controlbandwidthstandarddeviation-scenario1 ................ 90 5-8Averagerouteswitches-scenario1 ........................ 90 5-9Routeswitchstandarddeviation-scenario1 ................... 90 5-10Averageroutestabilizationtime-scenario1 .................... 91 5-11Routestabilizationtimestandarddeviation-scenario1 ............. 91 5-12Averagehashskew-scenario2 .......................... 91 5-13Skewstandarddeviation-scenario2 ........................ 91 5-14Averagecontrolbandwidth-scenario2 ...................... 94 5-15Controlbandwidthstandarddeviation-scenario2 ................ 94 5-16Averagerouteswitches-scenario2 ........................ 94 5-17Routeswitchstandarddeviation-scenario2 ................... 94 5-18Averageroutestabilizationtime-scenario2 .................... 95 5-19Routestabilizationtimestandarddeviation-scenario2 ............. 95 5-20Averagehashskew-scenario3 .......................... 95 5-21Skewstandarddeviation-scenario3 ........................ 95 5-22Averagecontrolbandwidth-scenario3 ...................... 96 5-23Controlbandwidthstandarddeviation-scenario3 ................ 96 5-24Averagerouteswitches-scenario3 ........................ 96 5-25Routeswitchstandarddeviation-scenario3 ................... 96 5-26Averageroutestabilizationtime-scenario3 .................... 97 5-27Routestabilizationtimestandarddeviation-scenario3 ............. 97 5-28Summarychartforlatencyexperiments ...................... 97 5-29Rangechartforlatencyexperiments ........................ 98 5-30MedianLatency ................................... 98 10


................................... 98 5-32Averageofweightedscores-scenario1 ...................... 99 5-33Standarddeviationofweightedscores-scenario1 ................ 99 5-34Averageofweightedscores-scenario2 ...................... 99 5-35Standarddeviationofweightedscores-scenario2 ................ 99 5-36Averageofweightedscores-scenario3 ...................... 100 5-37Standarddeviationofweightedscores-scenario3 ................ 100 11


Thisdissertationaddressestheissueofmulticastsessiondiscoverybyanenduser.IPMulticasthastremendousnetworkbandwidthutilizationbenetsoverconventionaldatatransmissionstrategies.UseofmulticastcouldprovecosteffectiveformanyContentDistributionNetworks(CDN).Fromanenduserperspective,accessingalivestreamusingmulticastwillresultinbettervideoreceptionqualitycomparedtotheunicasttransmission.Thisbeingimposedlargelyduetolimitedlinebandwidthbeingsharedamongseveralcompetingdatastreams.StillthedeploymentisverysparseintheInternet. OneofthereasonsislessuserdemandduetolowerusabilitycomparedtoIPunicast.ThesupportingnetworkinfrastructurethatwasdeployedafterstandardizationofTCPprotocolhelpedtremendouslyinimprovingtheusabilityofIPunicast.TheDomainNameService(DNS)infrastructurealloweduserstoaccesstargethostsusingaFullyQualiedDomainName(FQDN)stringagainstusingthedotteddecimalIPaddresses[ 1 ].SincetheunicastIPaddresseswereallottedinaregulatedmannerandbecauseofthelongevityofassignments,itbecameeasiertosearchandlocateresourcesontheInternet.Lackofsuchinfrastructuresupporthasdeprivedmulticastitsusabilityfromanenduserperspective.Moreimportantly,sharednatureofmulticast 12


Thisdissertationprovidesadistributedhierarchicalarchitecturethatefcientlyaddressessomeoftheusabilityissuesraisedabove.Thetreehierarchycloselyco-locatedwiththeDNSinfrastructureallowsthepresentedschemetoassignUniversalResourceIdentiers(URIs)formulticaststreamsthatanendusercanbookmark.Theproposedschemeautomaticallyre-mapsthecorrectsessionparameterswiththeURIsincasetheychangeinfuture.TheDistributedHashTable(DHT)approachforsearchanddiscoveryofmulticastsessionspresentedinthisdissertationusesatreehierarchywhichismoresuitableforthetaskathand.Manylivemulticaststreamsarenotreplicated,sothereisaneedtolocatethesourceofthedataandthereforethesearchschemerequiredissomewhattraditionalinnature.TherelativeinstabilityofmanymulticaststreamsandassociatedsessionparametersmakesmanytraditionalP2PDHTschemesunsuitablefortheproblemsaddressedinthiswork. Simulationresultsandanalyticalcomparisonoftheproposedschemewithexistingapproachesarepresentedtowardstheendofthisdissertation.AdetaileddiscussionofwhyseveraloftheexistingDHTschemesforkeywordsearchandSessionAnnouncementProtocol(SAP)/SessionDiscoveryProtocol(SDP)basedmulticastsessiondiscoveryschemesareunsuitablefortheidentiedproblemispresentedaswell. 13


2 ],andmulticast[ 3 ][ 4 ].Unicastallowspointtopointcommunicationbetweennetworkedhosts.InIPunicast,thesourceanddestinationaddressesidentifyuniquenodesintheglobalnetwork.Inanycastmodel,associatedwithaxedanycastaddress,therecouldbemorethanonehostassociated.Thecommunicationparadigmthatitsupportsisonetoatleastonemodel.Thenetworkrouterstriestodeliverthedatatoatleastoneofthehostsassociatedwiththatanycastaddress.IPmulticastliesattheotherendofthespectrum.Itallowsforonetomany(SSM)[ 5 ]ormanytomany(ASM)[ 6 ]transmissionparadigms. Themulticasttransmissionparadigmispartlydeterminedbythedistributiontreethatthecorenetworkusesfordatadistributionamonginterestedrecipienthosts.RendezvousPoint(RP)[ 7 ]baseddistributiontreegenerallyallowsAnySourceMulticast(ASM)tooperate.InRPbaseddistributiontree,anetworkhostinterestedinreceivinggroupcommunicationjoinsthedistributiontreeatoneofthenearestleafnodes.ThesourcesendsdatatotheRPnodesandthedataisdisseminateddowntoalltheinterestedrecipients.InASMmodel,thedatasourceneedstolocatetheRPnodeinordertosendgroupdata.Thesenderisnotrequiredtojointhemulticastgroupinordertosendthedata.SincemanyhostscansenddatatothesamemulticastgroupbyjusttransmittingdatathroughtheRPnode,hencethenameAnySourceMulticast. InSourceSpecicMulticast(SSM)whichissometimesalsoreferredtoassinglesourcemulticast,thedatadistributiontreeisrootedatthedatasource.Now,inadditiontondingoutthemulticastgroupthatarecipientnodeisinterestedinjoining,itmustalsondoutthesourcenodeIPaddressinordertojointhecorrectdatadistributiontree. 14


Datatransmissioninunicastvmulticast InFigure 1-1 ,thesourcenodehastoreplicatethedatastream4timestosupport4recipienthosts.Thereishigherbandwidthloadonintermediatesectionsofthecorenetworkaswell.Comparingthiswiththecasewherethesourcenodeistransmittingdatausingmulticast,thesenderjustprovidesonedatastreamandthecorenetwork 15


Sincenetworkbandwidthisasharedresource.Availablebandwidthinthecorenetworkisgenerallysharedamongallcompetingtrafc.Ongoingdebateonnetneutralityisinfavorofmaintainingthisunbiasedsharingofcorenetworkbandwidth.Assumingthatthecompetingdatastreamsreceivefairshareofthelinkbandwidth,overallstreamdataratewillbegovernedbythebandwidthitreceivesatthebottlenecklinkalongtheroutefromsourcetodestinationnode.Insuchascenario,multicastcanplayabigroleinimprovingtheperceivedQoSatthereceivinghost. Perceiveddatarateinunicastvmulticast Letusassumethatthebottlenecklinkhas200KbpsasshowninFigure 1-2 .Inunicastscenario,thereare4datastreamssharingthatbottlenecklink,andthereforeassumingfairshareofbandwidtheachstreamgets50Kbpsrate.Eventhoughthe 16


Multicastalsooffertremendouscostbenetstothecontentproviders.Aprovidertransmittingdatausingmulticastcanpotentiallyservelargesubscriberbaseusingasmallserverfarmasthebandwidthrequirementwouldbefairlyconstantregardlessofthenumberofsubscribers.Incontrast,withunicast,thebandwidthrequirementatthesource(contentprovider)growslinearlywiththenumberofsubscribers.ApopularcontentprovidermaypotentiallyneedtomanagealargeserverfarmandpurchaselargerbandwidthatapremiumfromitsInternetServiceProvider(ISP). Bandwidthrequirementsvsnumberofrecipientsinunicastandmulticast Figure 1-3 showsthebandwidthrequirementsasnumberofrecipientsgrowforunicastversusmulticastmodeofdatatransmissionatthecontenthost(sender).Thegureassumesthatthebasedatastreamtransmissionisat100Kbpsandthecore 17


Clearly,multicastofferstremendousbandwidthsavingsandmonetarybenetstocontentproviders,endusersandcoreaswellasfringeISPs.Letusnowndouthowmulticastcanbeenabledinthenetworkandthehardwareandsoftwarerequirementsbeforeonecanuseit. 1. addressing 2. routingcapabilities 3. abilityforenduserstojoin 4. mustbeusablefromusers'perspective 5. lowercomplexityofdeploymentforISPs Thetop3requirementsareprimaryformulticastcapabilitytoevenexistinthenetwork,butthelast2pointsareveryimportantforactualdeploymentandusagebyendusers.Letsdiscusseachoftheserequirementsinbriefnow. 18


IANAassignedmulticastaddresses(fewexamples) Address(IPv4) Address(IPv6) Usage Scope FF02:0:0:0:0:0:0:1 AllNodeAddresses LinkLocal FF02:0:0:0:0:0:0:2 AllMulticastRouters LinkLocal FF01:0:0:0:0:0:0:1 AllNodeAddresses NodeLocal FF05:0:0:0:0:0:1:3 AllDHCPServers SiteLocal FF0X:0:0:0:0:0:0:130 UPnP AllScopes Keepingalltheserestrictionsinview,InternetAssignedNumberAuthority(IANA)adoptedasomewhatrelaxedattitudetowardsmulticastaddresses. IANAassignedtheoldclassDaddressspaceformulticastgroupaddressing.Alladdressesinthisrangehave1110prexastherst4bitsofIPv4address.Therefore,IPmulticastaddressesrangefrom224.0.0.0-,themulticastdatapacket'sscopewasdeterminedbyTimetoLive(TTL)scopingrules.Overtheperiod,TTLscopingwasfoundtobeconfusingtoimplementandmanageinthedeployednetworks.AsIPmulticastgainedsometraction,IANAstartedtomanagetheaddressspacemoreefciently.Table 1-1 showssomeoftheaddressesthathavebeenassignedbytheIANAandtheirintendedpurposeandvalidscopes. InIPv4239.0.0.0to239.255.255.255hasbeenreservedasAdministrativelyScoped[ 8 ]multicastaddresses.Datatransmittedonthesegroupsarenotallowedtocrosstheadministrativedomainboundaries.ForIPv6thisrangeisdenedasFFx4::/16.FFxE::/16isdenedasglobalscope,i.e.datapacketsaddressedtothisaddressrangeareeligibletoberoutedoverthepublicinternet.Figure 1-4 showsthegeneralformatofIPmulticastaddressesinIPv6. Figure1-4. MulticastaddressformatinIPv6 19


9 ][ 10 ][ 11 ][ 12 ][ 13 ][ 14 ][ 15 ].AndwithSSMslowlyreplacingASMmodelofmulticastintheInternet(oratleastwearehopingthatitwillbethecase),theaddressclashproblemwillbetakencareof.Letstakeabrieflookintowhatisrequiredformulticastdatapacketstoberoutedinthenetwork. 16 ]check.Forexample,iftheincomingdatapacketdistributionisdonealongasourcetree(asinthecaseofSSM),theRPFcheckalgorithmcheckswhetherthepacketarrivedontheinterfacethatisalongthereversepathtothesourcefromthatrouter.Ifyes,thenthepacketisforwardedonallinterfacesalongwhichoneofmorerecipienthost(s)canbereached.Ifnot,thepacketfailstheRPFcheckandisdropped. TheRPFchecktablecouldbebuiltusingaseparatemulticastreachabilitytableasisdoneinthecaseofDistanceVectorMulticastRoutingProtocol(DVMRP)orcouldbedoneusingIPunicastforwardingtableasindoneinProtocolIndependentMulticast(PIM). Manycompetingintra-networkmulticastroutingprotocolsexistinthenetworkstodayincludingDVMRP[ 17 ][ 18 ],PIM(bothdensemode(DM)[ 19 ]andsparsemode(SM)[ 20 ]),MulticastOpenShortestPathFirst(MOSPF)[ 21 ],CoreBasedTrees(CBT)[ 22 ],etc.tonameafew.BecausePIMusesunicastforwardingtablesforRPFcheck,itiseasiertoimplementandisrapidlygainingacceptanceamonginternetvendors.PIM-SMhasemergedastheprotocolofchoiceforimplementingmulticastroutingin 20


23 ]havebecomeprotocolsofchoicefordevelopinginter-networkmulticastnetworks.AnotherupcomingprotocolsimilartoMBGPisBorderGatewayMulticastProtocol(BGMP)[ 24 ]thathelpsinterconnectseparatenetworksusingaglobalbi-directionalsharedtree. PIM-SMmakesuseofsharedtreeforinitialdatadeliverytoreceivers.ThesesharedtreesarerootedatnodesreferredasRendezvousPoints(RP).Inanetwork,severalRPscouldexist,eachRPcouldbeconguredtoactasadistributiontreeforseveralmulticastgroups.Thereexistseveralvendorspecicprotocolsex.Cisco'sAuto-RP[ 7 ]thatallowsallPIM-SMrouterstolearntheGroup-to-RPmappings.SimilarvendorspecicprotocolsexistforRPdiscovery. AssumingthataRPinadomainwouldeventuallyknowaboutalltheactivesourcesinthatdomain,thereneedstobeawaytondoutaboutsourcesinotherdomains.ThisisachievedusingMulticastSourceDiscoveryProtocol(MSDP)[ 25 ].UsingMSDP,aRPispeeredwithanotherRPinsomeotherdomain.MSDPhelpsformanetworkamongMSDP-peeredRPs.ThesepeeredRPsexchangemulticastsourceinformationamongeachother.ThusintimeaRPinonedomainisabletodiscovermulticastsourceinanexternaldomain. ThusenablingglobalIPmulticastinvolvesaplethoraofcomplexprotocolstobeimplementedinthemulticastcapablerouters.ThecharmofSSMisthatitdoesawaywithseveralcomplexitynecessitatedbecauseofmultiplesourcessupportinASMmode.Letsexaminenexthowanenduserjoinsorleavesamulticastgroup. 26 ][ 27 ][ 28 ][ 29 ][ 6 ][ 30 ][ 31 ]allowsanendusertoindicatetothersthopmulticastcapablerouteraboutitsinterestinreceivingmulticastpacketsbelongingtoaparticulargroup.TheveryrstversionofIGMPwasHostMembershipProtocolwrittenbyDr.SteveDeeringaspartofhisdoctoralresearch.IGMPprotocolisusedby 21


ThemostrecentversionforIGMPisversion3[ 31 ]whereasthelatestMLDprotocolstandsatversion2[ 29 ].IGMPversion2[ 30 ]addedlowhostleavelatencytoIGMP1[ 6 ]andIGMP3addedsourcelteringcapabilitiestoversion2.MLDversion1[ 27 ]providedsimilarfunctionalitiesasIGMPv2andMLDv2[ 29 ]allowsforsimilarfunctionsasIGMPv3.BothIGMPv3andMLDv2aredesignedtobebackwardcompatiblewithearlierversions. IGMPv3packetformat-membershipquery Figure 1-5 showstheformatofamembershipquerymessagethatissentbythemulticastcapablerouterstoquerythemembershipstatusontheneighboringinterfaces.Hostscancontacttheneighboringroutersnotifyingthemoftheirmulticastreceptionstateoranychangestoitusingamembershipreportmessage.TheformatofIGMPv3membershipreportmessageisshowninFigure 1-6 Amulticastcapablerouterusesmembershipquerymessagetondoutifanyactivelistenerispresentinanyofitsneighboringinterfaces.InFigure 1-5 ,theMaxRespCodeelddenotesthemaximumtimeallowedbeforesendingtheresponsetothequery,SistheSuppressRouter-SideProcessingag,QQICistheQuerier's 22


1. 0x12:Version1MembershipReport 2. 0v16:Version2MembershipReport 3. 0x17:Version2LeaveGroup IGMPv3packetformat-membershipreport UsingtheseIGMPqueryandIGMPreportmessages,thehostsandtheroutersareabletodetermineonwhatneighboringinterfacestheyshouldforwardthemulticastpacketstoandwhatallinterfacesmustberemovedfromthemulticastforwardinglistforanygivenmulticastaddress.Hostscanalsotakeinitiativeandnotifytheneighboring 23


32 ]isthenameresolutionservicethatmapsFQDNstoappropriateIPaddressesandinturnmakestheuseofURLspossible.AsmostoftheresourceontheInternetarestableresourceswithlongtermavailability,enduserscanbookmarkFQDNsandURLsforfutureuse. Longtermstabilityofresourcesandtheiravailabilityhasanotherbenet.Thesecanbeindexedbysearchenginesusingwebcrawlers.ThisallowsuserstolocatecontentovertheInternetusingkeywordsearches.KeywordsearchesallowedbysearchengineslikeYahooandGooglealongwiththeuseofURLsandFQDNshavehelpedimprovetheusabilityofwebforanaverageuser. Asdocumentedearlier,lackofpriorknowledgeofgroupcompositionandthetimeanddurationofagroup'sexistencealongwithnoexplicitrestrictionontheuseofagroupaddressotherthanthegeneralclassicationenforcedbytheIANApresentsseveralchallengesthatareabsentinthecaseofunicast. 1. unstablegroupaddress:becauseofnolongtermstabilityassociatedwithmulticastaddressesassignedtousergroups,thesecannotbeusedwiththecurrentDNSscheme.DNShasbeendesignedwithstabilityofFQDNsand 24


2. lackofastandardcontentdiscoverymechanism:asmulticastcontentshavetransientlifecycleswithvarieddurationandavailability,itbecomesalmostimpracticalformodernsearchenginestocrawlmulticastcontentspaceandmaintainacrawlerdataofcontentswhoseavailabilityisatbestuncertain.Thereisalackofastandardservicethatwouldallowenduserstolocatemulticastcontents. Traditionallyusersgetinformationaboutthetimeanddurationofpopularmulticastgroupsthroughusenetgroupsandthroughemailsfromfriends.Clearlythesediscoverymechanismsarenotscalableifmulticasthastobecomeauserdriventechnology.Therehastobeastandardandscalableservicethatwouldallowenduserstolocateexistingmulticastsessioninalmostrealtime.Improvingusabilitywillensurenextwaveofuseracceptanceformulticasttechnology.LetusexaminenowwhyISPshavebeenreluctanttodeploymulticast! whatallgoestomakemulticastworkinamodernnetwork.SupportingASMmodeofoperationisespeciallycomplexastheresponsibilityofsourcediscoveryrestswiththerouters(conguredasRPs).The2stepPIM-SMprotocolwhereinitiallyhostsgetdataviashareddistributiontreesrootedatRPsandlaterswitchtoshortestpathtree(SPT)addscomplexity.WithadditionofsourcelteringinIGMPv3[ 31 ]andMLDv2[ 29 ],thereceivergainsthecapabilitytospecicallydenoteasetofsourcesitisinterestedingettingthemulticastdatafrom. Networkresearchersagreethatimplementingasinglesourcemulticastwithstrictunidirectionaldataowfromthesourcetointerestedhostsismucheasiertoimplement.Iffurther,sourcediscoveryismadeauserprerogative,thenetworkwillbereleasedfromtheaddedburdentorunMSDPprotocolandmaintainingalistofactivesenders.Further 25


ItisthuseasytoseewhynetworkoperatorsandroutervendorsarereluctanttoimplementanddeployASMmulticastinthenetwork.ButwithIGMPv3becomingastandardandSSMresearchmaturing,thenetworklayerresponsibilitiesforIPmulticastwillbereducedsignicantly.IfsufcientuserdemandexistsforIPmulticast(providedusabilityconcernsraisedin areaddressed),ISPsshouldhavenodifcultyinenablingIPmulticastintheirnetworks.PricingincentivesforcontenthostsandCDNserviceproviderscouldalsospurmulticastgrowth. 33 ]andmulticaststreamingofdigitalTVinSpainareafewexamples.TrueglobaldeploymentofnativemulticastwillcomeonlyifISPsarepressurizedtodeploymulticastduetorisingend-usermulticastdemand. Thisdissertationworkstartedwiththequestion-WhyistherelowenduserdemandforIPmulticast?.Thelackofaseamlessmechanismtosearchformulticastcontentwasoneofthefactors.Thisdissertationpresentsatree-basedDHTapproachthatsolvesthemulticastsessionsearchissueinagloballyscalablemanner.Geo-taggingwasintroducedinthesessionrecordstructuretoallowformoreadvancesearchdimensionality.Anothermajorfactorforloweruserdemandwaslowerusabilityofmulticasttechnologyascomparedtounicast.Complementarysupportarchitectureis 26


Theaimofthisworkistoheraldinaneraoftrueendusermulticastuse.Ifproperinfrastructuresupportisprovided,onecanimagineseveralinterestingusecasestoemergeduetoit.Real-time,trulyscalable,citizeniReportingwhichwouldinstantlyprovidevideofeedtomillionsofviewersworldwide,couldbeonesuchusecase.Disasterpreparednessanddisastermanagementcouldbemadeeasier.Allthiswouldrequireanarchitecturethatwouldallownewmulticastsessionstobemadediscoverabletoothersinarealtimefashion.Theproposedarchitectureachievesthisverygoal. Someskepticsmayask-`WhynotuseGoogleorYahoosearchtoachievesessiondiscovery?'.Theanswerforthatcanbefoundinsection 27


1. globalscalabilityanddistributeddesign-theproposedarchitecturemustbegloballyscalableandshouldbedistributedinnaturetominimizebottlenecks 2. abilitytoexistincurrentnetworkenvironment-aproposalthatnecessitatesmajornetworkhardwareandsoftwareoverhaulwillmostlikelynotgetdeployedbecauseofmonetaryconstraints,soexistenceinexistingenvironmentbecomesimportant 3. realtimediscoverabilityofsessions-eventhosesessionthatarecreatedatamomentsnoticeshouldbeimmediatelydiscoverablebyusers 4. multi-parametersearch-theproposalmustprovideendusersamulti-dimensionalandfullsearchcapability 5. fairnessinworkloaddistribution-thedesignmustbefairwithrespecttoworkloadandresourcerequirementsontheparticipatingparties 6. plug-n-playdesignwithlowersystemadministrators'involvement-theproposalmustrequireminimalinvolvementonthesystemadministrators'part,networkadministratorsarealreadyoverworked 7. abilityofpartial/inphasedeployment-thedeploymentshouldbeusefulevenifdeployedonaverysmallscale 8. selfmanagingstructureinthefaceoftopographicalchanges-thearchitectureshouldbeselfmanaginginthefaceofdynamictopographicalchanges 9. sessiongeo-taggingtosupportlocationbasedsearch 10. multicastmodeindependence-abilitytoexistinASMorSSMnetworksoreveninnonmulticastenvironments 28


34 ]isareasonabletoolthatachievesequitabledistributionofdataovermultiplesites.InaDistributedHashTable(DHT)scheme,therecord'skeywordishashed.Thehashvaluedetermineswheretheactualrecordwillbestoredforretrievallater.InmDNSsearcharchitecture,eachdatasiteorMulticastSessionDirectory(MSD)managestwotypesofrecords.ThesearecalledEachsiteorMSDmaintainsthreedatabasesThesedatasitesarelinkedtooneanotherinatreehierarchywithasinglerootnode.TheDHThashspaceisdividedamongalltheparticipantsinthetreeoverlay.Thealgorithmsmanagingthehashspacedistributionandredistributioninthefaceoftopologychangesarediscussedlaterinthischapter. 2-1 showsthecomponentsthatmakeupanadministrativelyscopedmulticastsession(localsession)andgloballyscopedmulticastsessionrecords.Someofthedataelements'importancewillberevealedinnexttwochapters. Abriefexplanationofthevariouseldsfollowsnext29


Localandglobalsessionrecordsstructure 30


SomeofthetermssuchasURSwillbeexplainedingreatdetailinChapter3.Thehashingstrategythatcanbeusedisanysecurehashingalgorithm.FortheproofofconceptexperimentweusedMD5[ 35 ]. 31


URSwillbedescribedinmoredetailsinChapter3.TheMSDserversineachdomainretrievestheaboveparametersduringbootstrapphase.ThevaluesoftheparametersallowtheMSDservertojoinanexistinghierarchy.MultipleMSDserverscanbestartedinthesamedomaintoimprovelocalredundancy.EveryMSDserverisequippedtoexecuteacommonleaderelectionprotocol.Theelectedleaderbecomesthe`designated'MSDserver.SuchaserverinadomainwillbereferredtoasMSDd. Ageneraldomainhierarchy Eachdomainreportsthecountofdomainsinthesubtreerootedatselftoitsparentdomain.Figure 2-2 showsageneraldomainhierarchy.Thenumberslistednextofeach 32


1 showsthehashspacedivisionprocessperformedattherootnode. Verysimilaralgorithmtoalgorithm 1 isperformedateachintermediatenodeinthehierarchywhentheyhavetoallotsubsapcestotheirchildren.TheonlychangeisthatMSBissettothevaluethatisreceivedfromtheparentnodeandisnotcomputed,andSTARTandENDvaluesaresettothehashrangestartandendvaluesreceived.Asandwhenthehashdistributionispropagateddown,thenodesupdatetheirroutingtablesthatallowanodetorouteaparticularhashvaluetowardsthetargetdomaininthehierarchy.Figure 2-3 showstheexampleofhashspaceassignmenttodomainsAandGinthehierarchyaswellastheDHTroutingtableconstructionatthetwonodes,nodeAandnodeGusingalgorithm 1 NowthatwehaveseenhowtheDHThierarchyiscreatedintheproposedarchitecture,letusdiscusstheoperationspermittedintheDHThierarchy.Typically,anyP2PDHTschemeallowsinsertionandremovalofparticipatingpeersaswellasadditionanddeletionofdatarecords.Eachofthese4operationsarediscussednext. 33

PAGE 34 Therootreassignsthehashspaceandthehashspaceallotmentisupdatedasitpercolatesdownfromtheroottotheleafnodes.Ifanystoredrecordataparticular 34


Exampleroutingtablestructure nodenowdoesnotliewithinitsassignedhashspace,thatrecordismigratedtothecorrectnewerdestinationusingthehash-routingtablemaintainedateachnode.Frequentadditionofdomainsintheoverallhierarchycanleadtofrequenthashspacereassignmentsandfrequentmigrationofrecords.Thehierarchyroutinginfrastructureandrecordslocationstabilityisimprovedusingthedomain-countreportingstrategydescribedin 2.2.4 .Everynodealsoknowstherootnode'sunicastconnectiondetailswhichtheymayuseincaseofadeadparentscenarioinordertolocatetheappropriategraftingpointinthetree.Thestrategyisdescribedinlaterchapters. 35


StepsinDHTdomainaddition timeout,theparentwillupdatethechildcountandthisupdatedcountwillbepropagatedtowardstheroot.Thiswouldleadtohashspacereassignmentfromroottowardsalltheleaves.Stabilizationstrategyisdescribedin 2.2.4 36


2-5 showsanexamplecasewherearecordanditsshadowcopyisroutedthroughtheDHTstructuretoappropriatetargetdomains.Theroutingisdoneusingthekeywordhashoftherecord.Thehashvalueshowninthegureisarbitraryandisprovidedforclarityonly.Routingisdonebasedonrst4bitsofthehashinthegure. DHTrecordinsertionexample 37


2 38


36 ]thatweoated.Inthissectionwewilldescribethedesignweimplementedfortheproofofconceptversionthatwasdeveloped. ThedatastructurecomponentsusedinthisdatabaseareshowninFigure 2-6 .Wehaveusedakeywordmeta-storetospeedupthesearchprocess.Themeta-storeisa 39


Globalsessionsdatabasedesign 40


2-1 Figure 2-7 showstheideabehindthedatabaseconstruction.Itshowstheearthcoordinatesystemandtheschematicrepresentationofthegeo-taggeddatabase.Earthgeographiclocationscanbeaddressedpreciselyusinglatitudeandlongitudecoordinates.Latitudesvaryfrom-90to+90alongsouth-northcorridor.Similarlylongitudesvaryfrom-180to+180alongwest-eastcorridor.Latitudesareparalleltoeachotherandareequidistant.Everydegreeseparationbetweenlatitudesequals110.9kmingrounddistance.Thedistancerelationshipbetweenlongitudesisnotthatstraightforwardbecausetheyconvergeatthepoles.Thisrelationshipisfurthercomplicatedasearthisnotaperfectsphere. 41


Equation( 2 )showstheeast-westdistancebetweeneverydegreechangeinlongitudesatlatitudewitha=6,378,137mandb=6,356,752.3m. Geo-taggeddatabasedesign Underthecurrentgridmapwheremajorlinesbeinglatitudesandlongitudes,eachbeing1apart,earthcanbemappedinto180x360gridspace.Sincealmost70%ofearthsurfaceiscoveredbywater,70%ofthegridlocationsnaturallywouldmaptowaterbodies.Oftheremaining30%oflandmass,researchshowsonly50%oflandareaisinhabitedbyhuman.Therefore,weforeseeonly15%offullgridlocationstobeeverusedtogroupmulticastsessionsbelongingtosuchgridpositiondependingontheirgeo-tags.Thereforesparse-matriximplementationofplanetarygridseemsreasonable. 42


Equation( 2 )governsthenorth-southresolutionattreedepthnandequation( 2 )governstheeast-westresolutionatsamedepthatlatitude. Anysessionthatgetsstoredateitherglobalorlocaldatabasesalsokeepsacorrespondinggeo-referenceinthegeo-taggeddatabase.Thesereferencesaremaintainedatcorrectgridlocationinthelevel0structureandatcorrectleaflinked-listinthetreerootedatcorrespondinglevel0gridposition.Thegarbage-collectorthreadwhileremovingstalesessionsfromglobalandlocaldatabasesremovesthecorrespondingreferencefromgeo-taggeddatabaseaswell.Maintainingthisadditionalstructureallowsnewserviceparadigmstobesupportedthatwaspreviouslynotpossible.Fewsuchservicessuchasreal-time`iReporting'andsupportforgeo-specicandproximitysearchcriteriahavealreadybeenmentionedearlier.Nextletuslookatthekeyalgorithmsneededtosupportaseamlessusersearchexperienceanddiscussthemodalitiesofthesupportedoperation. 43


2-8 showsthescreen-shotofthesessionregistrationtoolimplementedaspartofthe`proofofconcept'demonstrationofthisservice. Aspartoftheregistrationdata,thecontenthostmustprovideavalidlocation,listofkeywordsthatcloselydescribethesessioncontent,scopeofthesessionandotherassociatedparameters.TheMSDserveronreceivingtheregistrationrequest,createsasessionrecordforeverykeywordspeciedintherequestandstoresthoserecordsunder`LocalRecordsDatabase'regardlessofthescopeofthesession. Ifthesessionscopeisglobalinnature,theMSDdservercreatesa`remote-register'[ 36 ]protocolmessageforeachofthekeywordsandroutestherequesttoremotedomainsusingthehashroutingtablemaintainedlocally.Incaseiffewsuchrequestsareroutedto`self'thentheMSDservercreatesasessionrecordforthatkeyword 44


Screenshot-sessionregistrationtool 45


2-9 showsasimpleregistrationscenariowithsessionscopesetas`global'. Sessionregistration Theotherkindofsearchsupportthat`mDNS'supportsistheglobalsearchfunction.TheenduserpresentsthesearchquerytothelocalMSDdserver.Dependingonthesearchcriteria,the`mDNS'hierarchyroutesandprocessesthequeryandthecandidatesessiondetailsaresentbacktotherequestingenduser.Inordertoreduceprocessingloadontheservers,thesearchaggregationisleftfortheenduser'ssearchtooltoperform.Thedetailsonthistypeofsearchisgivennext. 46


Sessionsearch Figure 2-10 showsthegeneralschemebehindglobalsearchsupportinthearchitecturepresented.TheendusermakesthesessionsearchquerytothedomainlocalMSDdserver.MSDserverparsesthequeryandifthescopeofthesearchis`administrative'only,thenonlythe`LocalSessionRecords'databaseissearchedandthematchingsessionsarereturnedbacktotherequestingparty. However,thingsgetsomewhatcomplicatedinthecaseofglobalsearch.Anaivewaywouldhavebeentooodthequerytoallparticipatingdomains.BecauseoftheDHTtreestructureandkeywordroutingusinghashvalues,thesearchismoreefcient.MSDdparsesthequerystringandtransformsasinglesearchqueryintomultiple`msd-probe'protocolmessages[ 36 ]foreachuniquekeywordpresentinthesearch 47


afterreceiving`redirect'message,thesearchclientissupposedtoinitiate`ext-search'protocolexchangesequencewiththetargetMSDserver clientscansend`invalidate'backtotheMSDserveriftheremoteservernolongermaintainssessionrecordsfortherequestedkeyword,inthatcasetheMSDinvalidatesthestalecacheentryforthatkeywordandsends`msd-probe'againtorefreshthestateentry. ifclientdecidesthattheremoteMSDserverisdownitcanrequesttheMSDinitslocaldomainforbackupserverdetails,thattheserverfindsoutbysending`msd-probe'withbitinversionsettoTRUE 25end 3 showswhathappenswhenasearchqueryisreceivedattheMSDdfromasearchclientinthesamedomain.Letustakealookatwhathappenswhenthesearchquerycomesfromanexternalsearchclient.Inthatcase,onlygloballyscopedsessionsaresearched-becausesendingadministrativelyscopedsessiondetailstoa 48


4 showswhathappenswhenan`ext-search'messageisreceivedatthetargetMSDdserver. onlygloballyscopedsessionarereturned 4setkeywordhashhash(keyword) 49


37 ][ 38 ][ 39 ]buthasnotbeenincorporatedintheproofofconceptimplementationneitherinourproposedIETFRFC[ 36 ]yet. Figure 2-11 showsthesequenceofeventsafteranodefailureleadingtotemporarygraftingofthechildnodeatanappropriateancestornodeinthehierarchy.Algorithm 5 describeswhathappensinaparentnodefailuresituation. Ifthehierarchyrootdomainfails,theneachofthechildrennodewillhavenotemporary`graft'option.Aftersomeperiodeachofthemwillassumerootresponsibilitiesandthehierarchywilldeteriorateintodisconnectedforest.ItisessentialtoprovidesufcientredundancyattherootlevelintheformofmultiplebackupMSDserversrunningatanygiventimetopreventsuchascenariofromrealization.Inararescenario,simultaneousURSandMSDdfailurecanalsoresultinafaileddomainevenifthatdomainhasmultiplebackupMSDservers.Suchascenarioshouldbepreventedattherootlevelatleast. 50


Parentnodefailurerecoverystrategy 51

PAGE 52 40 ]hasbeenusedtocreateaswellasbroadcastmulticastsessioninformationtoallpartiesinterested.sdrusesSDPandSAPforpackagingandtransmittingthesemulticastsessioninformationonawellknowngloballyscopedmulticastchannel,`sdrhasnumerouslimitations.ThebandwidthrestrictionsenforcedonSAPcausessignicantdelaysinsessioninformationreachingremotehosts.Also,everyreceivermustconstantlylistentoperiodicannouncementsonsap.mcast.netandsdrclientsmulticaststhesessiondetailsfor 52


AnotherproblemwithsdranditsunderlyingSAPimplementationiscausedduetoannouncementsburst.Thedelaybetweenburstcyclesaregreaterthanmulticastroutingstatestimeoutperiod.Thisiscausedduetodefaultbandwidthrestrictionof4000bpsinSAP.Thisleadstounnecessarycontrolpacketsbeingsentinthenetworkrecreatingthealreadytimedoutmulticastdistributiontreeinthecorenetwork. 41 ],authorshavetriedtoaddresssomeoftheissuesinsdr.Theyproposedamulti-tiermeshofrelayproxyserverstoannouncemulticastsessionsusingSSMtointerestedrecipients.Intheirapproach,everynetworkoperatorthatprovidesSSMservicealsorunsaSAS(SessionAnnouncementServer).TheyproposerelaxingthebandwidthlimitofSAPinlocalnetworkstoahigherbandwidthlimit.FurthereachsuchSASserverlinkstothelevel2SASserverthatrunsinthecorenetwork.Everylevel2SASserverisinterconnectedinameshfashionwitheachother.Suchanextensivemeshcouldcausesignicantnetworktrafcinthecorenetworkwithincreasingnumberoflevel2SASserversdeployed.Theyassumethatonlyafewlevel2SASserverswouldbeneededintheirscheme.Regardless,theirschemestillremainsapushbasedschemeandsuffersfromlimitationsofSAP.Therestillremainsasignicantdelayinsessioninformationbeingdisseminatedtoremotehosts(albeitmuchlesserdelaycomparedtosdr).TheirschemealsotransmitsthecompletesessiondetailstoeverySASserverinthehierarchyonaperiodicbasiscausingunnecessarynetworktrafc.Administrativeburdenisincreasedinthisschemeaswell,aseverylevel2SASservermustbefedtheconnectiondetailsofeveryotherlevel2SASservers. 53


42 ],theauthorsanalyzeddrawbacksofsdr.TheyanalyzedannouncementdelayscausedduetobandwidthrestrictionsinSAP.Theyfoundoutthatonaverage,minimumannouncementintervalequals5minutes.Takingintoaccountpacketlossovertheinternet,theyconjectured,userspotentiallywouldneedtowait10ormoreminutestobuildthesessionslist.TheyproposedanarchitectureusingSDPproxythatsupposedlyisonlineformuchlargerintervalsoftimeandbuildssessionlistoverperiod.EnduserscannowdirectlycontactthenearestSDPproxytogetthesessionlist.Theproblemstilllieswiththedelayinvolvedforanewlycreatedsessiontobediscoveredatremoteuser.AnnouncementsfromoneSDPproxytoanotherisstillratelimited.Bythetimeashortdurationsessioncomesonlineandisdiscoveredbyaremoteuser,potentiallythatsessioncouldalreadybeover.SDPProxiesarenotsuitablefortransientandshortlivedsessions. 43 ]architectureandprotocolsuitewasdevelopedbyresearcherstoenhanceinformationgatheringandindexingfromdisparatesourcesovertheinternetinordertoreducenetworkload.AlthoughtheoriginalpurposeofHarvesthasbeenverydifferentbutthearchitectureandtheprotocolsuitecanbemodiedslightlyinordertoserveasamulticastsessionsdiscoveryarchitecture.TheHarvestarchitectureusesmultipleGatherersthatresideclosetoinformationsourceanditinterfaceswithmultipleBrokerapplicationsthatprovidesastandardqueryinterface.Itusesreplicatedcachesbasedoneventualconsistency.Althoughifmodied,HarvestwouldnotsufferfrombandwidthrestrictionsofSAPbuttheeventualconsistencymodelcouldcauseproblemsforshortlivedsessionscreatedwithoutpreplanning.Eventualconsistencymodelcouldalsorenderseveralsessionundiscoverableforsignicantdurationoftime.Further,replicatedcacheswouldresultinduplicatesbeingcreatedwastingresources. 54


44 ]theauthorsintheefforttoenablelayeredmultimediatransmissiontoreceiverswithvaryingcapabilities,proposedmodicationstosdrastheyproposedatwostagesessiondirectoryservice,apersistentserverthatcachesSAPannouncementsandephemeralclientthatcontactstheservertogetthesessionslisttherebyreducingthelonglatencyassociatednormallywithsdr.Usualproblemswithsdrstillpersists,usershavetobrowsethroughalongsessionlistinordertondasessionofinterest. 45 ],theauthorshavereectedonthelimitationsofsessiondirectorybasedonSessionDescriptionProtocolandSessionAnnouncementProtocol.TheyhavearguedthatalthoughsdrapproachisnotscalablebutthesessiondiscoverycanbemadebetterbystandardizingtheadditionalattributesinSDPsothatitcanbeorganizedandindexedinaseparateserverthatwouldprovideMulticastSessionDirectoryService(MSDS)toendusers.TheseMSDSserversthencandisseminateinformationonawellknownsinglemulticastchannelormultiplethemebasedmulticastchannelstotheendusers. 46 ]thathasbeendevelopedaspartofSematicMulticastprojectstrivestoprovideaselforganizing,hierarchicaldistributedcachewheremultimediasourcesregistertheircontentinformationandthetopicalmanagersintelligentlydeterminewhereinthehierarchytostorethesessioninformation.TheirapproachstillmakesuseofSAPlikeperiodicannouncementswhichisawasteofbandwidtheventhoughtheseannouncementsaremainlyonthecontentmanagershierarchyinformation.ItisstillnotclearhowIDGenablesenduserstoperformmulticastsessionsearchbasedonmultiplekeywords.Howistheadditionalnetworkhardware 55


47 ],pastry[ 37 ]andbamboo[ 48 ],distributedmesharrangementsasintapestry[ 49 ],hierarchicalstructuresuchasKademlia[ 38 ]andspatialDHTasinCAN[ 39 ]whereroutingisdoneusingcartesiancoordinates.TheP2PDHTschemesallowforscalability,selforganization,andfaulttolerance.Yettheymaysufferfromissuesresultingfromchurn[ 50 ]andnon-transitivity[ 51 ]inconnectionsamongparticipatingnodes.ResearchershaveevenproposedunstructuredDHT[ 52 ]overlaysthatprovidebenetsofstructuredDHTs.LetusnowbrieylookatsomeoftheseDHTschemes.InChapter5wewilldiscussreasonstodevelopourownDHTschemeandexplainwhywechosenottousecurrentDHTschemesformDNSarchitecture. 47 ]isadistributedP2Parchitecturethatallowsuserstostorekeysandassociateddatainanoverlay.Givenakeyitprovidesservicethatmapsthatkeyontoanexistingnodeintheoverlay.Eachnodemaintainsinformationaboutroutingkeystoappropriatenodeinitslocalngertable.Fingertablesareconstructedbasedonlocalinteractionamongparticipatingnodesandanynodeneednotknowtheglobalstateoftheoverallsystem.Theroutingtable(ngertable)sizeinastablestategrowsat 56


53 ]isafreeandsharedDHTdeploymentthatcanbeusedbymultitudeofapplications.Thedesigngoalsfocusonadequatecontroloverstorageallocationmechanismsothateachuser/applicationgetsitsfair-shareofstorageandsomewhatgeneralAPIrequirementssothattheoverlaycanbeusedbyabroadspectrumofapplications.ItprovidesapersistentstoragesemanticsbasedsomewhatonthePalimpsestsharedpublicstoragesystem[ 54 ].Theimplementationprovidesasimpleput/getbasedAPIforsimpleapplicationdevelopmentandamoresophisticatedAPIsetcalledReDiR.ThemainfocusinthisDHTschemeisstarvationpreventionandfairallocationofstoragetoapplications.Competingapplications'keysarestoredusinguniquename-spacesassignedtoeachapplication.KeywordroutinginOpenDHTistreebasedandisdonehierarchically.Detailscanbefoundin[ 55 ]. 49 ][ 56 ]isaDHToverlaywhereroutingisdoneaccordingtothedigitsinthenodeaddress.Ateachroutingstep,themessageisroutedtoanodewhoseaddresshasalongermatchingaddressprexthanthecurrentnode.TheroutingschemeisverysimilartoschemepresentedbyPlaxton[ 57 ]withsupportfordynamicnodeenvironment.Intheirschemetheyproposeusingsaltstostoreobjectsatmultiplerootsthusimprovingavailabilityofdataintheirscheme.TheyuseneighbormapstoincrementallyroutemessagestodestinationIDdigitbydigit.TheneighbormapentryintapestryhasspacecomplexityO(logbN)where`b'isthebasefornodeIDs.Tapestryschemeusesseveralbackpointerstonotifyneighborsofnodeadditionsordeletions.Severalsuccessfulapplicationsthatusetapestryformessageroutinghavebeendeveloped.Notable 57


58 ]whichisawide-areapersistentdistributedstoragesystemmeanttoscaletheglobeandBayeux[ 59 ],anapplication-levelmulticastprotocol 37 ]isanapplicationlayeroverlaydevelopedincollaborationwithRiceUniversityandMicrosoftResearch.Eachnodeinpastryisassignedaunique128bitsIDthatindicatesitspositioninthecircularIDspace.Everypastrynodemaintainsaroutingtable,neighborhoodsetandaleafsetthathelpstheoverlaytodealwithintermittentnodefailures.Neighborhoodsetcontainspredenednumberofnodesthatareclosesttothegivennodesbasedonsomesetproximitycriteria.WhereasaleafsetcontainnodeswhosenodeIDsareclosesttothecurrentnode'sID.Neighborhoodsetisnotusedinroutingbutusedtoguaranteelocalityboundsinrouting.Theroutingtablehasdlog2bNeentrieswith2b1entriesineachrow.`b'isacongurationparametertypicallysetto4bytheauthors.TheroutingschemeisprexbasedandisverysimilartooneadoptedbyTapestry[ 49 ][ 56 ].Severalsuccessfulapplicationshavebeendevelopedthatusepastryastheirroutingbase.NotableamongthemarePAST[ 60 ]andSCRIBE[ 61 ].Aglobalbootstrappingservice[ 62 ]foranyapplicationlayeroverlayhasalsobeenproposedthatusespastryasitsroutingbase. 38 ]hasseveralattractivefeaturescomparedtootherapplicationoverlays.Itminimizesthenumberofcongurationmessagesthatnodesmustexchangeinordertondoutabouteachother.ItusesXORofnodeIDsasameasureofdistancebetweentwonodes.BecauseofsymmetricnatureofXOR,nodesparticipatinginKademliaoverlaylearnusefulroutinginformationfromthekeywordqueriesreceived.OtherDHTslackthisability.Additionally,Kademliausesauniedroutingalgorithmfrombeginningtillendregardlessofproximityofintermediatenodestothetargetnode.Thissimpliestheroutingalgorithmquitesignicantly.Nodesaretreatedasleavesinabinarytreewhereeachnode'spositioninthetreeisdeterminedbytheshortestunique 58


RoutingtableinKademliaisarrangedink-bucketsofnodeswhosedistanceliesbetween2iand2i+1fromitselffor0i<160.`k'isadesignparameterwhichtheauthorschoseas20.Theroutingtableisitselflogicallyarrangedasabinarytreewhereeachleavesarek-buckets.Eachk-bucketcoverssomerangeoftheIDspaceandtogethertheycovertheentire160bitIDspace. 39 ]isanoverlaywherenodespaceisad-dimensionalcoordinatespace.ThecoordinatespaceatanytimeiscompletelypartitionedamongallparticipatingNnodes.EachkeyinCANismappedtoacoordinateintheCANcoordinatespaceandthusismappedtothenodemanagingthespacewithinwhichthiskeylies.Routingisdonebyforwardingmessagetotheneighboringnodewhosecoordinateisclosesttothedestinationcoordinate.ACANnodemaintainsacoordinateroutingtablethatholdsvirtualcoordinatezoneofeachofitsimmediateneighborsonly.Ifthereare`n'nodesthatdividesthewholecoordinatespaceintonequalzones,thenaverageroutingpathlengthinCANis(d=4)(n1=d)hopsandindividualnodesmaintain2dneighborsforad-dimensionalcoordinatespace.Theauthorsproposeusingmultiple`reality'alongwithmultiplepeersineachzoneandmultiplehash-functionsforroutingoptimizationsandimprovingtheoverallavailabilityofdataintheirscheme. 59


Thearchitecture,asithasbeenimplementedinapplicationlayerasanoverlayachievesindependencefromlowerlayerdetailsandisincrementallydeployed.Evenifa`mDNS'domainisnotlinkedtotheglobalhierarchy,itcanstillprovidevaluabledirectoryservicestothedomain'slocalendusers.InconjunctionwithURSitcanallowuserstosearchandbookmarktheirpopularmulticastcontentsforlaterviewing. 60


32 ].UseofdomainnamesandURLshavemadetheInternetmoreusableforendusers. MostofthecontentmadeavailabletotheendusersontheInternetarestaticandaremadelongtermavailable.Awebpagehostedsomewherewillmostlikelybefoundatthesamelocationformanyweekstocome.Thisquasi-permanenceofthedataallowssearchengineslikeYahooandGoogletocrawlthewebandindexthecontent.Thesewebindexescanbesearchedbyenduserstolocatecontentstheydesire.ExistenceofwebindexesandDNSservicehave,withoutarguments,madetheInternetamuchmoreusabletechnologytodaycomparedtoitsweedays. ThescenarioforIPmulticastistotallydifferent.Groupaddressesforcontentdeliverytointerestedusersarenotpermanent.Furtherthecontentstreamtransmittedoversuchmulticastgroupsaretypicallyverydynamicinnature.Generallyspeaking,IPmulticasttrafcisnotcrawlertechnologyfriendlyandthetransientnatureofthesessionmakesindexingalmostimpossible.ContentdiscoverysimilartothatprovidedbywebsearchenginesthatwouldallowenduserstolocateasessionofinterestisnonexistentintheInternet.TheapproachprovidedinChapter2addressesthat. 61


3-1 showsthelocationandnamesofthe13DNSrootservers.Theseserversarereplicatedforsecurityandredundancyreasons. 62


LocationandnamesofDNSrootservers[source:ICANN] 63 ][ 64 ]totheclient'smachine.TheclientsideDNSserveraskstherootserversfortheaddressoftherespectiveTLDDNSserver.TheTLDDNSserverhasanentrythatpointstotheauthoritativeDNSserverforthedomainthathastoberesolved.ThelocalDNSclientthenqueriestheauthoritativeDNSserverandgetstheIPaddressofthemnemonic(domain-name)toberesolved. TheDNSnameresolutionisdoneusingbothrecursiveanditerativeresolution.Resolutionproceedsiterativelyuntilthequeryreachestheauthoritativenameserverandiftherearelocalserversbelowthat,theresolutionproceedsrecursivelyuntiltheaddressrecordislocatedandsentbacktotherequestingDNSclient.ADNSservermaintainsseveralrecordtypesinitsinternaldatabase.LetustakeabrieflookatsomeofthecommonrecordsthatarestoredaspartoftheDNSdatabase. 3-1 showsthemostcommonrecordtypes.NowthatwehaveseenthebasicsofaDNSserver,letusseeindetailshowanURSisdesignedandhowitachievesitsintendedgoals. 63


CommonDNSrecordtypes RecordType Name Value A hostname IPAddress NS domain authoritativeDNSservername CNAME aliashostname Canonicalhostname MX aliashostnamename Canonicalnameofmailserver 64


Thisrecordhelpsin`mDNS'URInameresolutionprocess.AURSonlymaintainsrecordsforsessionscreatedinitsdomainonly.Theuniquenessinthe`URSIdentier'valueisonlyenforcedwithrespecttoitsowndomain.Thatis,notwosessionscreatedinthatdomainandregisteredwiththeURSwillhavesame`URSIdentier'. ThecontentproviderinadomainisrequiredtoregisterthesessiondetailsalongwithauniqueURSidentierwiththeURSinhis/herdomain.Ifinfuturehis/hersession'sconnectionparameterschanges,he/sheisrequiredtoimmediatelyupdatetheURSrecord.Thisupdationprocesscanbeautomatedinthesessionmanagementtool 65


3-2 showsthestepsinvolved. Letussaytheuseristryingtoaccessamulticastvideostreamthathasan`mDNS'`URS-identier'settogatorsandsotheURSlocatestherelevantrecordandsendsitbacktotheenduser.Thisrecordhasallthenecessaryparametersneededbythemulticaststreamreceivertojointherelevantsession.IftherecordgatorswasnotfoundatthetargetURS,thenameresolutionwouldhavefailed.Inthatcasea`ResourceNotFound'typeerrormessagewillbedisplayedattheenduser'smachine. 66


Typicalstepsin`mDNS'URInameresolution MSDserversusetheseparameterstosetupnecessarycommunicationchannelsinordertojointhe`mDNS'servicehierarchy.Parentdomain'sURLstringisneededin 67




Intheprevioustwochapters,Chapter2andChapter3,wehaveseentwodifferentissuesfacingIPmulticastacceptancebyanaverageenduser,namelytheabilitytolocaterelevantmulticaststreamsandsecondly,aconvenientmechanismtoremember,bookmarkandaccessafavoritestreaminfuture.Chapter2dealtwithastructuredproposalthatallowsanaverageusertolocateamulticaststreamalongthesimilarlinesofkeywordbasedwebsearches.Chapter3proposedamechanismthatwouldallowthemulticaststreamstobeassignedamnemonicnamejustlikeaweb-pageURLsanddomainnames.Useofmnemonicswillgreatlyimprovetherecall-abilityofstreamnamesascomparedtounusablenetworkIPaddressestypicallyassignedtosuchstreams.Thosetwochaptersdealtwiththeissuesinisolationtoeachother.InthischapterwewillpresentthecompletesystemarchitecturethatmergestheresourcesdescribedinChapter2and3intoaseamlessglobalsystemthatimprovestheoverallusabilityofmulticasttechnology. 69


AtypicalmDNSdomaincomponents Figure 4-1 showstypicalcomponentsthatmakesupa`mDNS'domain.EachdomainhasaDNSserversetupandoneofmorereplicatedURS.IftheURSisreplicatedforloadbalancingpurposes,itisachievedviaDNSloadbalancingfeature.TheURSportnumberisassumedxedandwellknownandpossiblyIANAassignedvalue.ThedomaincanalsohaveoneormoreMSDserversrunning.OtherMSD 70


AsmentionedinChapter3theURSactsasbootstrappingmechanismforMSDservers.ThesystemadministratorneedstocongurePMCAST,CMCAST,network'sIGMPsupport,andtheparent'sdomainURLatthetimeofURSstartup. PMCASTisthegloballyscopedmulticastgrouponwhichthisdomainreceivescommunicationfromitsparentdomain.IfASMmodeissupportedthenanycommunicationtotheparentcanbesentusingthischannelotherwisethecommunicationupstreammustbedonethroughunicast.PMCASTvalueofaparticulardomainissameasCMCASTvalueintheparentdomain. CMCASTisthegloballyscopedmulticastgroupoverwhichadomainsendscommunicationtoitschildrendomains.IfASMmodeissupportedthenthechildnodecancommunicatebacktothisdomainoverthesamegroupotherwisetheymustuseunicasttocommunicateupstream.CMCASTvalueofaparticulardomainissameasPMCASTvalueinanychilddomain. Apartfromhard-codedcongurationparameters,URSalsomaintainsseveralsoftstateparameters.Importantamongthemare71


4-2 showsthecommunicationoverlayintheASMnetworkscenario.SinceCMCASTchannelofparentdomainissameasPMCASTchannelinthechilddomainandowofcommunicationisallowedalongbothparent-to-childandchild-toparentpaths,parentsandallchildrendomainsjointhecommonmulticastchannelforcommunicatingwitheachother. AtypicalmDNShierarchyinASMnetwork The`mDNS'structureiscapableofoperatinginamixedmulticastenvironmentaswell.AnetworkdomainthatsupportsbothASMandSSMmulticastmodeofoperationandsupportsboth(S,G)and(*,G)joinsaswellasdeploysanyrequiredsupporting 72


4-3 showsascenariowheretwohybridmulticastnetworksareshownasconnectingdisparatemulticastnetworks. AmDNShierarchyinmixednetworkoperationmode Thedomain'sURShelpsdecidewhatsortofmulticastmodetheMSDserverwilloperatein.Theinclusionofparent'sdomainURLstringallowstheURStocontacttheparentdomain'sURSandgetrelevantnetworksupportinformation.Incasewherethemulticastcommunicationbetweenparentandchildisnotpossibleusingmulticastthenafterapresetcommunicationtimeout(soft-staterefresh),aunicastlinkissetupbetweenthetwodomains.Thusinscenarioswherenohybridnetworktypeexistsandthereisnoconsistentnetworksupportformulticast,thecommunicationhierarchywilldegenerategracefullytounicastlinksbetweenparentandchildrendomains.Letusnowseeinsomedetailhowcachingisusedin`mDNS'. 73


InmDNS,oncethehash-spaceallocationandhash-routingconstructionphasestabilizes,theMSDconnectiondetailsbecomestableaswell.UnlessmanydomainsjoinandleavethemDNShierarchyinanarbitraryfashion,thehierarchyaswellashashspaceallotmentremainsstable.OnewayatargetMSDmaychangeevenifthehierarchyitselfisstable,isifthedesignatedMSDserverfails.InthiscaseifabackupMSDserverisrunning,itwillsoonbecomethedesignatedMSDserver(afterafreshleaderelection)andthustheIPaddresswillchange.Butweexpectsuchcasestobeveryrare.TheseargumentsmakeMSDconnectiondetailsanexcellentcandidateforcaching. Withcachesinplace,whenanend-userrequestsakeywordsearchformulticastsessions,thedomain-localMSDserverchecksthecache.Ifthereisacachehit,thenitimmediatelysendsthecachedconnectiondetailsforthetargetMSDservertotherequestingend-user.Theend-usertriestoconnecttotheremotetargetMSDserver;ifitsucceeds,thedelayincurredisreducedsignicantly.Ifitfails,mostlikelyduetochangedconnectioninformationinthetargetdomain(duetoprimaryMSDserverfailure),orifthetargetdomainisnotresponsibleforthekeywordduetomorerecenthashspacereassignments(likelycausedduetonetworktopographychanges),theend-userpromptsthedomain-localMSDservertoinvalidatethestaleentry.Theoriginaltwo-passprotocolisthenused,whichrefreshesthestaleentryandtheprocesscontinuesfromthere. 74


Typicalcachestructure accesstime freq score ip:port 1249607331102 534 247.00424 abc:q football 1249607331102 61 57.804245 def:w n 1249607377712 712 318.67035 ghi:e beach 1249607339173 11 37.884953 jkl:r 65 ][ 66 ]andLeastFrequentlyused(LFU)[ 65 ]cachingstrategies,weusedahybridcachingstrategy.Table( 4-1 )showstypicalcacheentriesinourhybridcachingstrategy. Assumethecurrentsystemtimeis1249607590678andthetimeoutvalueis3600000milliseconds(10hours).Theabovetableentriescorrespondtovaluescalculatedusing=0.4.Thescorecomponentforanycacheentryiscomputedusing(freq)+(1)timeout(tcurrtlastaccess) 6000001 Nowthatwehaveseenglobalsystemintegrationinsomedetail,letusunderstandhowadomainspecicsearchissupportedin`mDNS'.Laterwewillndoutinwhatsituationsthe`mDNS'servicestoanendusermaydeteriorateofevenfailcompletely. 75


Ausermustspecifywhatdomaintosearchusingthe`mDNS'URLstring.ThedomainURLisresolvedrstusingtheDNSandURSnameresolutionalgorithmdescribedinChapter3.OncetheURLhasbeenresolved,theclientcanquerytheURSoftheremotedomainandgettheremoteMSDdconnectiondetails.ThenitcanquerytheMSDdirectlyprovidingthesearchstringandwaitfortheresultstobereturned(ifany). 76


TheDNSfailuresaregenerallyveryrare.InthepastthereweresomeTLDpoisoningattacksbuttheywerelargelyineffectivebecauseofcachingandreplicationoftheTLDDNSinfrastructure. URSfailurecanalsoaffectnormal`mDNS'functionalityinanotherway.SinceURSmaintainsparentdomain'sURLstringandthereforeisabletoquerytheparent'sURSfordetailssuchasIPaddressandportdetailsoftheMSDdserverintheparent'sdomain.ThesedetailsmightbeneededincasetheMSDdserverisunabletoreceive 77








`mDNS'architecturaloutlineandspecicsprovidedinthisdissertationhavebeenabletoachievemostofthegoalsthatwereidentiedearlyon.Nowletusgureoutthequalityofgoalsmetandtheservicesprovidedandidentifyshortcomingsandareasofimprovementsinthearchitecture. 81


Theareaofreliabilityinthefaceofdomainfailuresmaybeimprovedabit.Althoughstoringashadowsessionrecordatadifferentlocationhelpsalleviatetheissueabitbutamorerobustdomainfailuresafeguardalgorithmcouldbedesigned.Theauthenticationandsecurityaspectofinter-domaincommunicationespeciallyvalidationandvericationofcontrolmessageshastobeworkedon.AlthoughbecauseoftransitiontoSSMfromASMmodesomeofthemoreagrantsecurityissuesinIPmulticast,suchasspuriouscrosstrafcandanunhinderedsenderpolicywherethesenderneednotbepartofthemulticastgroupwhereitsendsdataon,shouldtakecareofitself. 82


5-1 showsthescreenshotoftheautosimulatorprogram.TheautosimulatorprogramstartedtherequirednumberofvirtualDNSapplicationswithappropriatecongurationparametersandstartedtheURSandMSDserverforeachvirtualDNSserverinstancethuscreatingasimulatednumberof`mDNS'domains.ThevirtualDNSserverparametersweresetinawaytolinkthedomainsappropriatelyaccordingtothesimulationdomainhierarchyscheme.ThevirtualDNSsoftwarewascapableofdomainURLtranslationinaniterativemanner.ItalsosupportedbasicprotocolhandlingcapabilitiesthatallowedotherprogramstoquerycertainsimulationparametersoverTCP/IPsockets. 83


Screenshot-mDNSautosimulatorprogram ThehostenvironmentforrunningthesimulationwasaWindowsmachinewiththefollowingconguration84


5-2 showsthescreenshotofthetool. Screenshot-mDNSlatencymeasurementtool 85


Analternateapproachwouldhavebeentousemultipleinstancesofvirtualmachines(VM)tosimulateeachdomain.Thisapproachishighlytaxingintermsofhostmachineresources.EachVMconsumessignicantresourceandhenceoneislimitedtorunningonlyafewinstancesonanygivenmachine.Usingourstrategy,weareabletosimulate10-15domainsonourtestmachinewithoutincurringasignicantperformancepenalty. 67 ]foroursimulation. Multicastsessiondirectory(MSD)softwarewasmodiedtosimulateinter-domainnetworklatencyforusewiththesessiondiscoverylatencymeasurementexperiments.Inter-domainlatenciesweresetrandomlybetween25ms-100ms. 5-3 showsthethreedomaintopologyusedinthesimulationdatacollection. 86


Variousnetworktopologieschosenforsimulation Theabovegureshowsscenario1thatisasomewhatbalanceddomainarrangementinthehierarchy.Scenario2and3showsthetwoextremesofthedomainlinkagescheme.Scenario2isthetwolevelscenariowherethereisonlyoneparentdomain(atarrangement,treeofheighttwo)andscenario3istheotherextremewhereallthedomainsarelinkedinalinearorder(treeofheight10).Inthegure,thedirectionofanarrowshowstherelationshipisachildof,e.g.,A!BmeansAisachildofB. Forallthethreescenariosweconguredthesimulationcontrollertostartthevirtualdomainaccordingtothepermutation:[10,4,5,6,1,2,7,8,9,3]andinter-domaindelayvalues[5,5,5,10,30,600,5,5,300,30].Thevalueinpermutationlocationiactsaspointertopositioninthedelay-listforlocatingthedelayvaluetowaitbeforestartingthenextdomain.Thisishowthesimulationcontrolleracts:itrststartsvirtualdomain10,looksinto10thplaceinthedelay-list,ndsthevalue30,waitsfor30secondsbeforestartingthevirtualdomain4andsoon.Anothersetofvaluesthatweusedinoursimulationwasdomainstart-uppermutationvalue[10,1,4,5,2,3,6,7,8,9]anddelayvalues[5,5,5,5,5,540,5,5,5,5]. 5-1 showsapartiallistofvaluesofmeasuredsystemparametersforscenario1hierarchyusingthedomainstartuppermutationlist[10,4,5,6,1,2,7,8,9,3]andinter-domainstartupdelayvalues[5,5,5,10,30,600,5,5,300,30].Eachrow 87


68 ].Eachexperimentwasrunthreetimesandthevaluesrepresenttheaveragesacrosstheseruns.Thetablesalsogivethestandarddeviationvaluesacrosstheseruns. Figure5-4. Averagehashskew-scenario1 Figure5-5. Skewstandarddeviation-scenario1 Wecollectedthetotalnumberofroutingtableupdatesthatanydomainunderwentbeforestabilizing,thetimetakenforroutestobestabilized,andmeasuredthehash-spaceassignment`skew'andcontrolbandwidthusedupamongtheparticipatingdomains.Hash`skew'foreachdomainismeasuredusingthisformula:Hashskew=j(Hashfrac)assigned1


Partialsimulationdataforscenario1hierarchyforpermutationlist[10,4,5,6,1,2,7,8,9,3] BETAALPHASKEWST-DEVCBWST-DEVR-SWITCHST-DEVST-TIMEST-DEVSCOREST-DEV


Averagecontrolbandwidth-scenario1 Figure5-7. Controlbandwidthstandarddeviation-scenario1 Figure5-8. Averagerouteswitches-scenario1 Figure5-9. Routeswitchstandarddeviation-scenario1 Figure 5-4 showstheaveragehash-skewplotperdomainforsimulationhierarchytype1.Figure 5-5 showsthestandarddeviationinthehashskewvaluescomputedoverthreeexperimentalruns.Figure 5-6 showstheaveragecontrolbandwidthusageperdomaininthenetworktomaintainthedomainhierarchyforsimulationhierarchytype1.Figure 5-7 showsthestandarddeviationinthecontrolbandwidthusedperdomainforthreerunsofthesimulation. Figure 5-8 showstheaverageroutingtableswitchperdomainfordifferentvaluesofandforhierarchytype1.Figure 5-9 showsthestandarddeviationinroutingswitches 90


Averageroutestabilizationtime-scenario1 Figure5-11. Routestabilizationtimestandarddeviation-scenario1 fordifferentvaluesof&amongthreeexperimentalrunsforhierarchy1.Figure 5-10 showstheroutingtablestabilizationtimefordifferentvaluesofandforthesamedomainhierarchystructure. Table 5-2 showsthepartialdatavaluesforexperimentsdoneondomainhierarchysetuptype2andwithdomainstartingorderpermutationlistas[10,4,5,6,1,2,7,8,9,3]andinter-domainstartupdelayvaluesas[5,5,5,10,30,600,5,5,300,30].Table 5-3 showsthepartialdatavaluesfordomainhierarchyscenario3andwithsamedomainstartuporderanddelayparametersasbefore. Figure5-12. Averagehashskew-scenario2 Figure5-13. Skewstandarddeviation-scenario2 91


Partialsimulationdataforscenario2hierarchyforpermutationlist[10,4,5,6,1,2,7,8,9,3] BETAALPHASKEWST-DEVCBWST-DEVR-SWITCHST-DEVST-TIMEST-DEVSCOREST-DEV


Partialsimulationdataforscenario3hierarchyforpermutationlist[10,4,5,6,1,2,7,8,9,3] BETAALPHASKEWST-DEVCBWST-DEVR-SWITCHST-DEVST-TIMEST-DEVSCOREST-DEV


Averagecontrolbandwidth-scenario2 Figure5-15. Controlbandwidthstandarddeviation-scenario2 Figure5-16. Averagerouteswitches-scenario2 Figure5-17. Routeswitchstandarddeviation-scenario2 Figure 5-12 showstheaveragehash-skewperdomainfordifferentvaluesofandforsimulationscenario2.SimilarlyFigure 5-14 showstheplotforaveragecontrolbandwidthinbytes/secondforsimulationscenario2,Figure 5-16 depictstheplotforaverageroutingtableswitchandFigure 5-18 showsthestabilizationtime(inseconds)forroutinguxtosubsideforsimulationscenario2.Figures 5-20 5-22 5-24 ,and 5-26 showsthesameguretypesbutforsimulationscenario3.Oursimulationrunforeachtypecoversandvaluesbetween0.1to2.0withstepsize0.2with. 94


Averageroutestabilizationtime-scenario2 Figure5-19. Routestabilizationtimestandarddeviation-scenario2 Figure5-20. Averagehashskew-scenario3 Figure5-21. Skewstandarddeviation-scenario3 5-3 .ThedatavaluesrepresentedinTable 5-4 areinmilliseconds. Figure 5-28 showsalltheparametersrepresentedashorizontalbars.Xaxisdenotesthetimeinmilliseconds.Figure 5-29 showsthemaximum,minimum,andaveragelatencyvaluesforexperimentsconductedwithdomainsranginginnumbersfrom1through5.Xaxisshowsnumberofdomainsandy-axisshowstimeinmilliseconds. 95


Averagecontrolbandwidth-scenario3 Figure5-23. Controlbandwidthstandarddeviation-scenario3 Figure5-24. Averagerouteswitches-scenario3 Figure5-25. Routeswitchstandarddeviation-scenario3 Table5-4. Latencymeasurementssummary 1060 503 548.64 113.81 508 2 4736 503 1198.08 994.92 865 3 4904 503 1315 1034.11 934 4 4937 503 1493.92 1181.18 980.5 5 4990 504 1713.22 1217 1088 96


Averageroutestabilizationtime-scenario3 Figure5-27. Routestabilizationtimestandarddeviation-scenario3 Summarychartforlatencyexperiments Figures 5-30 and 5-31 showsthemedianandtheaveragelatencyvaluesinmilliseconds.Thex-axisshowsthenumberofdomains. Thesignicantjumpindiscoverylatencytimefromonedomaintohigherisduetothe`MSDPROBE'&`REDIRECT'protocolstepsinvolvedinthedomainexternalsearchesascomparedtodomainlocalsearchwhichisthecasewithsimulatingwithjustonedomain.Intheexperimentsweperformed,sessionregistrationwasperformedatrandomlychosendomainandsearchinitiationwasdoneimmediatelyafter 97


Rangechartforlatencyexperiments Figure5-30. MedianLatency Figure5-31. AverageLatency registrationagainatrandomlychosendomain/node.Thesetworandomselectionswereindependentofeachother.ResultsInterpretation


Figure5-32. Averageofweightedscores-scenario1 Figure5-33. Standarddeviationofweightedscores-scenario1 Figure5-34. Averageofweightedscores-scenario2 Figure5-35. Standarddeviationofweightedscores-scenario2 99

PAGE 100

Averageofweightedscores-scenario3 Figure5-37. Standarddeviationofweightedscores-scenario3 Figure 5-32 showstheaverageweightedscaledscoresforsimulationscenario1forthreeexperimentalruns.Thegurehasbeendrawnusingweightsof0.5forroutingtableswitches,0.3forroutestabilizationtime,and0.2forhash-skewvalue.Figure 5-34 showstheaverageweightedscaledscoresforscenario2andFigure 5-36 showstheaveragesofweightedscaledscoresforsimulationscenario3.Thex-axisrepresentsvaluesfrom0.1to2.0,y-axisrepresentsrangingfrom0.1to,andthez-axisrepresentsthescaledweightedscore. Lookingintotheweightedscoreplot,onecanseethatforscenario1simulationsetup,thebestperformanceisachievedif,2[1.82.0]with.Forscenario2,theoptimalsystemperformanceisachievedat2[0.41.0]and2[1.82.0]toreportsomeofthevalues.Forscenario3thesystemperformedbetterwith,2[1.22.0]with. Consideringthesimulationresults,itisclearthatthechoiceofanddependsonthenetworktopology.Asystemadministratorisfreetochooseavalueofhislikingalthoughitisadvisabletofollowthecommonselectionguidelinesforthefullhierarchy.Inordertomaintainglobalroutingtablestability,arelativelyhighvalueofissuggested,andforroutingtablestabilityatthesubtreelevel,ahighervalueofisadvised. 100

PAGE 101

Thesessiondiscoverylatencyexperimentsdemonstratesthestrengthofourproposal.Ascomparedtopreviousapproaches[ 46 ][ 44 ][ 40 ],wheresessiondiscoverybyinterestedreceiverscouldtakeanywherefromfewminutestofewhours,ignoringothertroublesinsuchschemes,thelatencyinourschemeisintheorderofmillisecondstofewseconds.Theresultsareforthekeywordsusagepatternthatrenderedcachingineffective.ThebigjumpinthemedianandaveragelatencyvaluesareduetoextraprotocoldelayincurredduetoMSDPROBEandREDIRECTstepsinvolvedininter-domainsearchesincachemisssituation.Witheffectsofcachingkickingin,onanaveragethelatencyinquerywillcomedownsignicantly. 101

PAGE 102

Sincethelinkedlistsizeforeveryoccupiedlongitudecorrespondingtoalatitudecouldbeatmost360,alineartraversaltondthecorrectlongitudelocationinthelistwouldalsotakeO(1)time. Usingequations( 5 )and( 5 )andspeciedvaluesforkanddthemaximumdepthofthetreehrootedatlinked-listentryrepresentinga(latitude,longitude)paircanbecomputedusing:d110.9 2rc.Atthisdepth,thegrid'shorizontalresolutioncanbecomputedusingequation( 5 )byreplacingninthatequationbyh'.Becausethehorizontaleast-westdistancedecreasesasonemovestowardsthepoles,numberofsub-gridsNthatwemusttraverselaterallyineast-westdirectioninoursparse-matricrepresentationcanbecomputedusingAndhencethenumberofpossiblelinkedlistsattreeleavenodesthatonemusttraversetondoutcandidatesessionrecordscanbeeasilyfoundoutusingNleafgrids=Nkhh0. 102

PAGE 103

2rcO(list))(5) whereCisaconstantthatcanvarybetween1and4dependingonsearch-criterion's(read:coordinates)proximitytothegridedgesorcornersofthetargetquadrantattreeheighth'.Thesearchcomplexitycanbereducedgreatlyifwereplaceleaflinked-listsbyhash-tablesandusingperfecthashingfunctions. Supposetherootnodehask-childdomainssuchthatthesumofMSD-designatecountattherootnodeisN.Letusdenotethenodecountfromeachchildnodebeingreportedtotherootnodebyni.ThusSinceMD5-128hashisused,thekeywordhashspacethatmustbedistributedamongparticipatingnodesis2128.AsweuseprexroutinginmDNS,supposethesignicantbitsthatareneededtorouteappropriatelybem.Andtherefore103

PAGE 104

Further,eachchildnodereallocatesitsassignedhash-spaceamongitschildrenanditself,thespacemustbedividedequallyintonishares,andthuseachparticipatingdomain'sdesignated-MSDserversharecomesouttoThisofcourseisvalidprovidedthedomainhierarchyremainsstableovertime.AsnewdomainsmaybeaddedandsomedomainsmayleavethemDNShierarchyovertime,therecouldbetimeswhentheaboveequitabledistributionmightbeviolatedforshortdurations.Thissituationshouldnotarisefrequentlyandweconjecture,itwouldmostlyoccurduringbootstrappingprocess.ThisminorturbulenceinstableequitabledistributionoccursbecauseofthewayAlgorithm2(seeChapter2)hasbeendesignedtominimizeroutinginstabilityandtoreducefrequentroutingux. LetusanalyzetheworkloadduetoroutingofsearchandregistrationrequeststoappropriateMSDservers.Clearly,anodethatcomeshigherupinthetreehierarchymustcarryoutmoreroutingresponsibilitiescomparedtoanodethatislocatedclosetotheleafdomains.Atanynodeintheroutingtree,nodei,supposetherearemchildrendomains,thentheroutingloadatthatparticularnodeibecomes wherecountjistheMSDcountpropagatedtonodeifromitschildjsub-domain.Thisofcourseassumesstableandequitablehashspacedistribution. 104

PAGE 105

Nowusingequation( 5 )inequation( 5 ),wegetwhichshowsthatthesearchrelatedworkloadisalsogenerallyequitableprovidedthatthekeywordsaresearchedatequallikelihood.Althoughduringshortdurations,somekeywordsaremorepopularthanothers,howeverthetrendoversignicantlongerperiodoftimeremainstobeseen. 105

PAGE 106

DHTfeaturecomparison routingtablesize averagehopcount m O(logbN 2+fan-outfactor(k) 38 ]thatemploysabinarytreearrangementofnodes,allparticipatingnodesareleavesinthebinarytree.Incontrast,inmDNS,thehigherupnodeshavetypicallylargermessageroutingburdencomparedtoleafnodes.Therootnodemanagestheoverallhashspaceallotmentanditssubsequentmanagement. SimilartootherDHToverlaysthathaveaconstantRelativeDelayPenalty(RDP)factor[ 69 ],becauseofthenatureofthemDNShierarchywhereneighboringdomainsaremorelikelytobenetworkneighborstoo,weconjecturethatmDNSRDPfactorwillbewithinaconstantfactorofactualnetworkroutingpathlength.LetuscomparethevariousDHTschemeswithrespecttotheirrespectiveroutingtablesizes,averageroutinghopcounts,andtheirlogicalnodeplacementstrategies.Table 5-5 showsthecomparisons. AmongthecomparedDHTschemes,Chord[ 47 ]andPastry[ 37 ]havenodeplacementsinalogicalIDspacerings,Tapestry[ 49 ][ 56 ]hasnodesinagraphlogicalarrangement,Kademlia[ 38 ]constructsabinaryprextreewithnodesasleaves,CAN[ 39 ]arrangesparticipatingnodesinad-dimensionalcoordinatespace,andmDNS 106

PAGE 107

InTable 5-5 ,`m'representsthenumberofbitsrepresentingachordnodeID.`N'representsthenumberofparticipatingnodesunderchordandpastry,butrepresentsthesizeofnamespacewithbase`b'.`n'denotestheactualnumberofparticipatingnodesforkademlia,CAN,andmDNSentries.Forpastry,`b'representsthenumberofbitsusedtorepresentthebaseofthenodeIDrepresentation,b=4signifyingabase16representation.Forpastry,`l'denotesthesizeofleafsetandproximityneighborslist. WepresentedanalyticalassessmentofsessionsearchcomplexitywhenusingGeo-DBdatabase.Wealsopresentedargumentsonthefairnessclaimmadeinthedissertationwithrespecttotheparticipatingdomainsintheoverall`mDNS'hierarchy.Wepresentedsearchlatencyexperimentresultsandtheirinterpretations.WepresentedcomparativeanalysisamongvariouspopularP2Parchitectureandthearchitecturepresentedinthisdissertation. 107

PAGE 108

IPmulticastisaveryefcientmechanismfortransmissionoflivedataandvideostreamstolargegroupofrecipients.Comparedtounicast,itofferstremendousbandwidthsavingstocontentprovidersandreducestrafcinthecorenetwork,lettingfreetheexpensivebandwidthforotherservices.ThebandwidthsavingsbecomevaluableandnoticeableoverthinTrans-Atlanticdatapipes.Fromanend-users'perspective,multicastimprovestheirqualityofserviceperceptionbecauseinsteadofmultipledatastreamscompetingforacongestedlinkresource,thereexistsonedatastreamandthereforeitgetsallotmentofhighercongestedlinkbandwidth. Eventhoughmulticasthasnumerousbenetsoverunicastfordatatransmissioninvariousscenarios,itsenduserdemandandnetworkdeploymentremainssparse.Unlikeunicast,wherethesourceanddestinationaddressesareuniqueandgenerallysomewhatstable,multicastaddressesareusuallyassignedforashorttermtothegroup.Thegroupaddressingistypicallyatandoffersnoclueaboutmessagetransmissiondirectiontorouters.SodataforwardingistypicallydoneusingRPFchecksandusingashareddistributiontree.TheconstructionofshareddistributiontreeandsourcediscoveryincludingmulticastroutingrequiresthenetworklayertoimplementseveralcomplexprotocolssuchasMSDP,BGMP,Cisco-RP,PIM-SM.Thisincreasedcomplexityinthenetworklayerandtherefore,increasednetworkmanagementcomplexityactsasadeterrentforthesystemadministratorsagainstnativemulticastdeployment.Furthermore,lackofascalableandrealtimeglobalmulticastsessiondiscoverysupportintheInternetandlackofusabilitypreventstheendusersfromtappingintothebenetsofmulticast. WiththeincreasingdeploymentofIGMPv3inthenetworkedges,endusershavegainedcapabilitytolterandsubscribetospecicsourcestheyareinterestedin.SSMreducesnetworklayercomplexityandtherebyincreasesitsacceptancebysystem 108

PAGE 109

Forunicast,thereexistsseveralsearchengineslikeGoogleandYahoothatmaintainsindexesofwebcontenthostedintheInternettoday.Whycanwenotleveragetheseenginesformulticastcontentdiscoveryaswell?Multicaststreamsareshortlivedandthedatatransmittedoverthemarealsonotavailableforalongtime.Moreoverthegroupaddressingisnotstableandthereisnoglobalhierarchicalstructureinaddressingasitexistsforunicastaddresses.Theseconstraintsmaketheuseofanycrawlerbasedtechnologyextremelydifculttouse.Inherentdelaysassociatedwiththewebcrawlertechnologyalsomakesitalmostimpossibleforshortduration,transient,nonplannedsessionstobeeverindexedandthereforediscoverablebythetargetaudience. Toimprovesessiondiscovery,thisdissertationproposesusingatreeDHThierarchy.UseoftreeDHTstructureallowsforefcientsearchroutinginthestructure.TheDHTproposedinthisdissertationtriestoachieveequalhash-spaceassignmentsamongparticipatingdomains.TheDHTstructureisselfmanaginginthefaceofchangestothenetworktopology.Theuseofdomaincountreportingthresholdsandimpartsstabilitytotheglobalroutingstructure.TheDHTstructureisalsocapableinrecoveringfromintermittentdomainfailures.Theuseofgeo-taggeddatabasesallowforanextrasearchcriterianamelygeographicalsearchestobeperformed. TheURSdesignanditscloseplacementwithDNSinfrastructureallowsmulticastsessionstobeassignedlongterm`mDNS'URLsthatanendusercanbookmarktobeusedlater.Evenifthesessionparameterschangelater,theURLswouldremainvalidandwouldmaptothecorrectsession.ExistenceofanURSallowsdomainspecicsearchtobeperformedfromanywhereintheInternet.Thisinfactmakesevenastand-alonedeploymentinjustasingledomainusefultotheendusers.ThisabilityforincrementaldeploymentintheglobalInternetisanassetinthedesign.URSaidsin 109

PAGE 110

Theintegrationofthetree-DHTschemeandtheURSallowsforimprovedsessionsearchcapabilitiesinthenetwork.Sincethesystemisregistrationbasedandiscapableindistributingtheregistrationdatatoappropriatesiteinthehierarchyinrealtime,eventhemosttransientsessionsbecomediscoverablebytheendusers. WeperformeddetailedsimulationrunstotestanddiscoveroptimalparameterssettingfortheDHTstructureindifferenttopologyscenarios.Forscenario1simulationsetup,simulationdatasuggeststhatthebestperformanceisachievedif,2[1.82.0]with.Forscenario2,theoptimalsystemperformanceisachievedfor2[0.41.0]and2[1.82.0].Forscenario3thesystemperformedbetterwith,2[1.22.0]with.Itisclearthatthechoiceofanddependsonthenetworktopology.Asystemadministratorisfreetochooseavalueofhis/herlikingalthoughitisadvisabletofollowthecommonselectionguidelinesforthefullhierarchy.Inordertomaintainglobalroutingtablestability,arelativelyhighvalueofissuggested,andforroutingtablestabilityatthesubtreelevel,ahighervalueofisadvised.NewServices 70 ]sessionthatbroadcastsjazzmusicfromalocationnearyou. Pictureanotherscenario,emergencychannelsmulticastedfromaregionhitbyanaturaldisasterwouldgenerallybemoreeffectiveinprovidingrealtimereliefinformationtoresidentsinthatarea.Informationsuchaswheretogoinordertogetcleandrinkingwater,abagoficeandmedicalaidortodisseminatecasualtyinformationcanbeupdatedinrealtimebyemergencyworkerspresentatdisastersitesratherthan 110

PAGE 111

Geo-taggedmulticastsessionscouldalsoheraldaneraofreal-timeyetdiscoverablecitizennewsreportingbyeyewitnessesatnewssites.ConsiderascenariowhereamajortrafcpileuphasoccurredonI95,afeweyewitnessesonaccidentsitemaystartalivevideofeedusingtheircameraphones(moderncellphonesarepackinginmoreandmorecomputepower),using3G[ 71 ]orGPRS[ 72 ],registerthemulticastsessionusingdescriptivekeywordssuchas,I95,pileup,accidentetc.andletthewholeworldwatchthenewsasitunfolds. RagingCaliforniawildreshavemadethecountyofcialsissuevoluntaryevacuations.Homeownerswhodecidetomoveoutarealwaysontheirtoestondoutthestatusoftheirhomes.Afewdaredevilswhodecidetostayback,couldstartavideofeedoftheirsurrounding,geo-taggingtheirsessionwithrelevantlocationwouldmakesuchsessionsdiscoverablewithmoreaccuracyandhomeownerswhovacatedcouldndthestatusofthatarea. Furthermorenetworktrafcifsourcedfromnearbylocationgenerallyismorereliableandimpervioustonetworkvagaries.Linkcapacitiesandtrafcprolehaveatremendousimpactonthequalityofsessionsthathavealargerhopcount.Thereforeusuallyonewouldwanttogetcontentsfromsessionshostedfromalocationnearoneself. Theseareafewscenariosamongmanythatsuggestthatgeo-taggingofmulticastsessionscouldhavesignicantimpactonthewaypeoplewouldconsiderusingmulticastinthefuture.NotonlymulticastwillbeaviablealternativeintransmittinglivebroadcastontheInternetbutwouldalsomakeitmoreappealingtogeneralmassesandwouldhelpincreatingdemandofmulticastservicesfromconsumers.Itwouldalsoenable 111

PAGE 112

Anorganicresearchtaskwouldbetoinvestigatethepossibilityofasharedmulticastdistributiontreeinamobilenetworkwithnodescominginandgoingoutattheirownwill.Truepotentialof`mDNS'canberealizedifmobileuserscanmulticastsessionsontheyandthatsuchextremelytransientsessionswouldstillbediscoverablebyotherusersintheglobalInternet.Hopefullythisdissertationisastartingpointinrevampingmulticastresearchinterestinthescienticcommunity. 112

PAGE 113


PAGE 117


PAGE 121


PAGE 125


PAGE 126

gator,hindi,rediff,football,soccer,movies,audio,songs,picture,piyush,amrita,table,dinner,restaurant,match,base,tyre,car,couch,potato,refrigerator,shelf,motorcycle,sweater,shirt,dress,purse,mobile,watch,clock,top,jacket,coat,idol,deity,kitchen,market,mall,road,footpath,spectacle,television,knife,board,onion,jalapeno,beer,time,mouse,telefone,pen,cover,case,copy,book,pencil,light,bulb,fan,tape,suitcase,paper,garland,garden,ower,carpet,tie,necklace,lens,camera,battery,cake,icing,sugar,milk,egg,water,envelope,drawer,cheque,belt,shoe,slipper,scanner,cards,rocket,shuttle,tennis,ball,legs,hands,ngers,nail,toe,hammer,srew,plier,match-stick,gun,fun,park,swing,slope,ranch,grass,bike,helmet,gear,gloves,batter,pillow,quilt,tissue,mop,broom,cargo,sweet,perfume,frangrance,meat,butter,salt,tea,coffee,ground,boil,receipt,plastic,oor,wire,number,frown,torch,rope,tent,camp,row,boat,tide,river,stream,ocean,mountain,mushroom,fungi,algae,ferns,leaf,bud,eggplant,cucumber,radish,mustard,honey,oil,pan,spatula,mixer,dough,juice,cook,cookie,spice,walnut,cinnamon,eat,jump,hop,run,play,alligator,turtle,sh,snake,slime,moss,bullet,cannon,lamp,medicine,vitamin,cholera,disease,hospital,doctor,nurse,patient,foot,malaria,scalp,ear,throat,drink,force,hair,long,dictionary,speaker,album,mirror,lip-stick,petroleum,gasoline,ourine,asbestos,arsenic,mild,wild,animal,deep,blue,whale,dolphin,puppy,birds,aquarium,radium,mars,planet,solar,sun,rays,ozone,atmosphere,aeroplane,ight,orange,pretzel,dance,salsa,latino,pepper,good,sauce,scream,shout,yell,radio,next,rock,guitar,saxophone,castle,stairs,porch,patio,change,pool,fry,saute,grind,burn,churn,turn,garbage,dust-bin,bun,noodles,rice,ring,police,jeep,truck,bus,children,school,nursery,animation,alien,combat,challenge,whip,leash,cream,pie,hat,bat,door,kid,prank,switch,blanket,death,fear,insect,net,mosquito,robot,laser,robot,hello,greet,smile,grin,strap,breeze,wind,air,gale,hurricane,storm,rain,current,ship,yatch,enough Datawascollectedusingupto5domainsconnectedaccordingtoscenario3hierarchy.The`VirtualDNS',MSD,andURSserverparametersweresetupaccordingtothecongurationdetailsfordomains10,1,4,5,and2providedearlier. 126

PAGE 127

[1] R.Wright,IPRoutingPrimer.MacmillanTechnicalPublishing,1998. [2] C.Partridge,T.Mendez,andW.Milliken,HostAnycastingService,RFC1546(Informational),InternetEngineeringTaskForce,Nov.1993.[Online].Available: [3] B.Williamson,DevelopingIPMulticastNetworks.CiscoPress,1999. [4] B.M.EdwardsandB.Wright,InterdomainMulticastRouting:PracticalJuniperNetworksandCiscoSystemsSolutions.Boston,MA,USA:Addison-WesleyLongmanPublishingCo.,Inc.,2002,forewordBy-JohnW.Stewart. [5] S.Bhattacharyya,AnOverviewofSource-SpecicMulticast(SSM),RFC3569(Informational),InternetEngineeringTaskForce,July2003.[Online].Available: [6] S.Deering,HostextensionsforIPmulticasting,RFC1112(Standard),InternetEngineeringTaskForce,Aug.1989,updatedbyRFC2236.[Online].Available: [7] D.FarinacciandL.Wei,Auto-RP:AutomaticdiscoveryofGroup-to-RPmappingsforIPmulticast.CISCOPress,Sept9,1998.[Online].Available: [8] D.Meyer,AdministrativelyscopedIPmulticast,RFC2365(BestCurrentPractice),July1998.[Online].Available: [9] M.Handley,Sessiondirectoriesandscalableinternetmulticastaddressallocation,SIGCOMMComput.Commun.Rev.,vol.28,no.4,pp.105,1998. [10] D.Zappala,V.Lo,andC.GauthierDickey,Themulticastaddressallocationproblem:Theoryandpractice,SpecialIssueofComputerNetworks,2004. [11] V.Lo,D.Zappala,C.Gauthierdickey,andT.Singer,Atheoreticalframeworkforthemulticastaddressallocationproblem,inIEEEGlobecom,GlobalInternetSymposium,Tech.Rep.,2002. [12] M.Livingston,V.Lo,K.Windisch,andD.Zappala,Cyclicblockallocation:Anewschemeforhierarchicalmulticastaddressallocation,ininFirstInternationalWorkshoponNetworkedGroupCommunication.Bowersock,1999,pp.216.[Online].Available: [13] S.Pejhan,A.Eleftheriadis,andD.Anastassiou,DistributedmulticastaddressmanagementintheglobalInternet,SelectedAreasinCommunications,IEEEJournalon,vol.13,no.8,pp.1445,Oct1995. 127

PAGE 128

S.Kumar,P.Radoslavov,D.Thaler,C.Alaettinoglu,D.Estrin,andM.Handley,TheMASC/BGMParchitectureforinter-domainmulticastrouting,SIGCOMMComput.Commun.Rev.,vol.28,no.4,pp.93,1998. [15] V.Jacobson,MultimediaconferencingontheInternet,SIGCOMM,Aug1994. [16] Y.K.DalalandR.M.Metcalfe,Reversepathforwardingofbroadcastpackets,Commun.ACM,vol.21,no.12,pp.1040,1978. [17] D.Waitzman,C.Partridge,andS.Deering,DistanceVectorMulticastRoutingProtocol,RFC1075(Experimental),InternetEngineeringTaskForce,Nov.1988.[Online].Available: [18] A.S.ThyagarajanandS.E.Deering,Hierarchicaldistance-vectormulticastroutingfortheMBone,inSIGCOMM'95:ProceedingsoftheconferenceonApplications,technologies,architectures,andprotocolsforcomputercommunication.NewYork,NY,USA:ACM,1995,pp.60. [19] D.Estrin,D.Farinacci,A.Helmy,V.Jacobson,andL.Wei,Protocolindependentmulticast(PIM)densemodeprotocolspecication,1996.[Online].Available: [20] D.Estrin,D.Farinacci,A.Helmy,D.Thaler,S.Deering,M.Handley,V.Jacobson,C.Liu,P.Sharma,andL.Wei,ProtocolIndependentMulticast-SparseMode(PIM-SM):ProtocolSpecication,RFC2362(Experimental),InternetEngineeringTaskForce,June1998,obsoletedbyRFCs4601,5059.[Online].Available: [21] J.Moy,MulticastExtensionstoOSPF,RFC1584(Historic),InternetEngineeringTaskForce,Mar.1994.[Online].Available: [22] A.Ballardie,CoreBasedTrees(CBT)MulticastRoutingArchitecture,RFC2201(Historic),InternetEngineeringTaskForce,Sept.1997.[Online].Available: [23] T.Bates,R.Chandra,D.Katz,andY.Rekhter,MultiprotocolExtensionsforBGP-4,RFC4760(DraftStandard),InternetEngineeringTaskForce,Jan.2007.[Online].Available: [24] D.Thaler,Bordergatewaymulticastprotocol(BGMP):Protocolspecication,RFC3913(Historic),Sep2004.[Online].Available: [25] B.FennerandD.Meyer,MulticastSourceDiscoveryProtocol(MSDP),RFC3618(Experimental),InternetEngineeringTaskForce,Oct.2003.[Online].Available: 128

PAGE 129

S.Deering,W.Fenner,andB.Haberman,Multicastlistenerdiscovery(MLD)forIPv6,RFC2710(ProposedStandard),Oct1999,updatedbyRFCs3590,3810.[Online].Available: [27] B.Haberman,Sourceaddressselectionforthemulticastlistenerdiscovery(MLD)protocol,RFC3590(ProposedStandard),Sep2003.[Online].Available: [28] R.VidaandL.Costa,Multicastlistenerdiscoveryversion2(MLDv2)forIPv6,RFC3810(ProposedStandard),Jun2004,updatedbyRFC4604.[Online].Available: [29] H.Holbrook,B.Cain,andB.Haberman,Usinginternetgroupmanagementprotocolversion3(IGMPv3)andmulticastlistenerdiscoveryprotocolversion2(MLDv2)forsource-specicmulticast,RFC4604(ProposedStandard),Aug2006.[Online].Available: [30] W.Fenner,InternetGroupManagementProtocol,Version2,RFC2236(ProposedStandard),InternetEngineeringTaskForce,Nov.1997,obsoletedbyRFC3376.[Online].Available: [31] B.Cain,S.Deering,I.Kouvelas,B.Fenner,andA.Thyagarajan,InternetGroupManagementProtocol,Version3,RFC3376(ProposedStandard),InternetEngineeringTaskForce,Oct.2002,updatedbyRFC4604.[Online].Available: [32] P.MockapetrisandK.J.Dunlap,Developmentofthedomainnamesystem,SIGCOMMComput.Commun.Rev.,vol.18,no.4,pp.123,1988. [33] BBCmulticasttrial.[Online].Available: [34] M.NaorandM.Yung,Universalone-wayhashfunctionsandtheircryptographicapplications,inSTOC'89:Proceedingsofthetwenty-rstannualACMsymposiumonTheoryofcomputing.NewYork,NY,USA:ACM,1989,pp.33. [35] R.Rivest,TheMD5message-digestalgorithm,RFC1321(Informational),Apr1992.[Online].Available: [36] P.HarshandR.Newman,AHierarchicalMulticastSessionDirectoryServiceArchitecture,Nov2009,internetEngineeringTaskForce,ID19409.[Online].Available: 129

PAGE 130

A.I.T.RowstronandP.Druschel,Pastry:Scalable,decentralizedobjectlocation,androutingforlarge-scalepeer-to-peersystems,inMiddleware'01:ProceedingsoftheIFIP/ACMInternationalConferenceonDistributedSystemsPlatformsHeidelberg.London,UK:Springer-Verlag,2001,pp.329. [38] P.MaymounkovandD.Mazires,Kademlia:Apeer-to-peerinformationsystembasedontheXORmetric,inLectureNotesinComputerScience.SpringerBerlin/Heidelberg,2002,pp.53. [39] S.Ratnasamy,P.Francis,M.Handley,R.Karp,andS.Schenker,Ascalablecontent-addressablenetwork,inSIGCOMM'01:Proceedingsofthe2001conferenceonApplications,technologies,architectures,andprotocolsforcomputercommunications.NewYork,NY,USA:ACM,2001,pp.161. [40] M.Handley,Thesdrsessiondirectory:AnMBoneconferenceschedulingandbookingsystem,April1996.[Online].Available: [41] P.NamburiandK.Sarac,MulticastsessionannouncementsontopofSSM,Communications,2004IEEEInternationalConferenceon,vol.3,pp.1446Vol.3,20-24June2004. [42] P.LiefoogheandM.Goosens,ThenextgenerationIPmulticastsessiondirectory,SCI,OrlandoFL,July2003. [43] C.M.Bowman,P.B.Danzig,D.R.Hardy,U.Manber,andM.F.Schwartz,TheHarvestinformationdiscoveryandaccesssystem,ComputerNetworksandISDNSystems,vol.28,no.1,pp.119,December1995. [44] A.Swan,S.McCanne,andL.A.Rowe,Layeredtransmissionandcachingforthemulticastsessiondirectoryservice,inACMMultimedia,1998,pp.119.[Online].Available: [45] A.Santos,J.Macedo,andV.Freitas,Towardsmulticastsessiondirectoryservices.[Online].Available: [46] N.Sturtevant,N.Tang,andL.Zhang,Theinformationdiscoverygraph:towardsascalablemultimediaresourcedirectory,InternetApplications,1999.IEEEWorkshopon,pp.72,Aug1999. [47] I.Stoica,R.Morris,D.Karger,M.F.Kaashoek,andH.Balakrishnan,Chord:Ascalablepeer-to-peerlookupserviceforinternetapplications,SIGCOMMComput.Commun.Rev.,vol.31,no.4,pp.149,2001. 130

PAGE 131

S.Rhea,B.Godfrey,B.Karp,J.Kubiatowicz,S.Ratnasamy,S.Shenker,I.Stoica,andH.Yu,OpenDHT:apublicDHTserviceanditsuses,inSIGCOMM'05:Proceedingsofthe2005conferenceonApplications,technologies,architectures,andprotocolsforcomputercommunications.NewYork,NY,USA:ACM,2005,pp.73. [49] B.Zhao,L.Huang,J.Stribling,S.Rhea,A.Joseph,andJ.Kubiatowicz,Tapestry:aresilientglobal-scaleoverlayforservicedeployment,SelectedAreasinCommunications,IEEEJournalon,vol.22,no.1,pp.41,Jan.2004. [50] S.Rhea,D.Geels,T.Roscoe,andJ.Kubiatowicz,HandlingchurninaDHT,inATEC'04:ProceedingsoftheannualconferenceonUSENIXAnnualTechnicalConference.Berkeley,CA,USA:USENIXAssociation,2004,pp.10. [51] Freedman,M.J.,Lakshminarayanan,Karthik,Rhea,Sean,andI.Stoica,Non-transitiveconnectivityandDHTs,WORLDS'05:Proceedingsofthe2ndconferenceonReal,LargeDistributedSystems,pp.55,2005. [52] K.P.N.PuttaswamyandB.Y.Zhao,Acaseforunstructureddistributedhashtables,inProc.ofGlobalInternetSymposium,Anchorage,AK,May2007. [53] S.Rhea,B.Godfrey,B.Karp,J.Kubiatowicz,S.Ratnasamy,S.Shenker,I.Stoica,andH.Yu,OpenDHT:apublicDHTserviceanditsuses,inSIGCOMM'05:Proceedingsofthe2005conferenceonApplications,technologies,architectures,andprotocolsforcomputercommunications.NewYork,NY,USA:ACM,2005,pp.73. [54] T.RoscoeandS.Hand,Palimpsest:soft-capacitystorageforplanetary-scaleservices,inHOTOS'03:Proceedingsofthe9thconferenceonHotTopicsinOperatingSystems.Berkeley,CA,USA:USENIXAssociation,2003,pp.22. [55] B.K.Sylvia,S.R.S.Rhea,andS.Shenker,SpurringadoptionofDHTswithOpenHash,apublicDHTservice,inIPTPS,2004. [56] B.Y.Zhao,J.D.Kubiatowicz,andA.D.Joseph,Tapestry:Aninfrastructureforfault-tolerantwide-arealocationand,UniversityofCaliforniaatBerkeley,Berkeley,CA,USA,Tech.Rep.,2001. [57] C.G.Plaxton,R.Rajaraman,andA.W.Richa,Accessingnearbycopiesofreplicatedobjectsinadistributedenvironment,inSPAA'97:ProceedingsoftheninthannualACMsymposiumonParallelalgorithmsandarchitectures.NewYork,NY,USA:ACM,1997,pp.311. [58] J.Kubiatowicz,D.Bindel,Y.Chen,S.Czerwinski,P.Eaton,D.Geels,R.Gummadi,S.Rhea,H.Weatherspoon,C.Wells,andB.Zhao,OceanStore:anarchitectureforglobal-scalepersistentstorage,inASPLOS-IX:ProceedingsoftheninthinternationalconferenceonArchitecturalsupportforprogramminglanguagesandoperatingsystems.NewYork,NY,USA:ACM,2000,pp.190. 131

PAGE 132

S.Q.Zhuang,B.Y.Zhao,A.D.Joseph,R.H.Katz,andJ.D.Kubiatowicz,Bayeux:anarchitectureforscalableandfault-tolerantwide-areadatadissemination,inNOSSDAV'01:Proceedingsofthe11thinternationalworkshoponNetworkandoperatingsystemssupportfordigitalaudioandvideo.NewYork,NY,USA:ACM,2001,pp.11. [60] A.RowstronandP.Druschel,StoragemanagementandcachinginPAST,alarge-scale,persistentpeer-to-peerstorageutility,SIGOPSOper.Syst.Rev.,vol.35,no.5,pp.188,2001. [61] A.I.T.Rowstron,A.-M.Kermarrec,M.Castro,andP.Druschel,SCRIBE:Thedesignofalarge-scaleeventnoticationinfrastructure,inNGC'01:ProceedingsoftheThirdInternationalCOST264WorkshoponNetworkedGroupCommunication.London,UK:Springer-Verlag,2001,pp.30. [62] M.Castro,P.Druschel,A.-M.Kermarrec,andA.Rowstron,Oneringtorulethemall:servicediscoveryandbindinginstructuredpeer-to-peeroverlaynetworks,inEW10:Proceedingsofthe10thworkshoponACMSIGOPSEuropeanworkshop.NewYork,NY,USA:ACM,2002,pp.140. [63] R.Droms,AutomatedcongurationofTCP/IPwithDHCP,InternetComputing,IEEE,vol.3,no.4,pp.45,Jul/Aug1999. [64] ,Dynamichostcongurationprotocol,RFC2131(DraftStandard),Mar1997,updatedbyRFCs3396,4361.[Online].Available: [65] R.Karedla,J.S.Love,andB.G.Wherry,Cachingstrategiestoimprovedisksystemperformance,Computer,vol.27,no.3,pp.38,1994. [66] E.J.O'Neil,P.E.O'Neil,andG.Weikum,TheLRU-Kpagereplacementalgorithmfordatabasediskbuffering,inSIGMOD'93:Proceedingsofthe1993ACMSIGMODinternationalconferenceonManagementofdata.NewYork,NY,USA:ACM,1993,pp.297. [67] TheNetworkSimulator-ns-2.[Online].Available: [68] P.Harsh,mDNSsimulationdataaccesswebsite.[Online].Available: [69] Y.-h.Chu,S.G.Rao,andH.Zhang,Acaseforendsystemmulticast(keynoteaddress),inSIGMETRICS'00:Proceedingsofthe2000ACMSIGMETRICSinternationalconferenceonMeasurementandmodelingofcomputersystems.NewYork,NY,USA:ACM,2000,pp.1. 132

PAGE 133

S.E.Deering,MulticastroutingininternetworksandextendedLANs,inSIGCOMM'88:SymposiumproceedingsonCommunicationsarchitecturesandprotocols.NewYork,NY,USA:ACM,1988,pp.55. [71] G.CamarilloandM.A.Garcia-Martin,The3GIPMultimediaSubsystem(IMS):MergingtheInternetandtheCellularWorlds.JohnWiley&Sons,2006. [72] R.Kalden,I.Meirick,andM.Meyer,WirelessinternetaccessbasedonGPRS,2000.[Online].Available: 133

PAGE 134

PiyushHarshhasbeenborninawelleducatedandscienticallyorientedfamily.Hisfather,whoisaM.D.hasbeenthebiggestinuenceonhim,instilledscienticcuriosityrightfromhisearlychildhood.Hewasalwaysagoodstudent,excellinginstudiesinhishighschool.AllhishardworkpaidoffwhenhegotachancetogoandstudyatIndianInstituteofTechnology,Roorkee.Hegraduatedwithabachelor'sdegreeinComputerScienceandTechnologyinSpring2003.TofurtherhisscientictraininghedecidedtoacceptfullscholarshipfromUniversityofFloridaandjoinedintothePh.D.programatDepartmentofComputerScienceandEngineeringinFall2003andcametotheUS.Hewastherstoneevertotravelabroadforhighereducationinhisfamily.UndertheableguidanceofDr.RichardNewman(hisadviser)andhisPh.D.committeemembers,especiallyDr.RandyChow,hewasinvolvedinnumerousscienticprojects.DuringhisstayatUniversityofFlorida,heworkedintheeldsofsecurity,computernetworksandcognitivecomputing.Latelyhisresearchinteresthasbeenfocusedonbio-inspirednetworkmodelsincludingwaystoadaptmodelsofhumanbrainintofuturenetworkdesign.Whenheisnotdoingresearchwork,heenjoysoutdooractivitiesincludinglongdistancetrailbikingandhikinginnaturereserves.Hebelievesinpreservationofenvironmentandaspirestobeanactiveparticipantinnearfutureinthisnobelcause. 134

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd