Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: A Fast and low overhead distributed priority lock
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00095280/00001
 Material Information
Title: A Fast and low overhead distributed priority lock
Alternate Title: Department of Computer and Information Science and Engineering Technical Report
Physical Description: Book
Language: English
Creator: Johnson, Theodore
Newman-Wolfe, Richard
Publisher: Department of Computer and Information Sciences, University of Florida
Place of Publication: Gainesville, Fla.
Copyright Date: 1994
 Record Information
Bibliographic ID: UF00095280
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Downloads

This item has the following downloads:

1994131 ( PDF )


Full Text









A Fast and Low Overhead Distributed Priority Lock*

Theodore Johnson and Richard Newman-Wolfe
Dept. of CIS, University of Florida
Gainesville, Fl 32611-2024
ted cis.ufl.edu, nemoicis.ufl.edu




Abstract
Distributed synchronization is necessary to coordinate the diverse activities of a distributed
system. Priority synchronization is needed for real time systems, or to improve the performance
of critical tasks. We present a distributed priority lock that uses Li and Hudak's path compres-
sion methods to achieve a theoretical O(log n) messages per critical section request, where n is
the number of processors. In addition, our algorithm requires only O(log n) bits of storage per
processor, by making use of distributed lists. We present performance results to show that the
expected message complexity of the algorithm is indeed O(log n) per critical section request.
The low storage and overhead requirements of the algorithm make it scalable and practical for
implementation. In addition to its use in synchronization, our algorithm has applications to
distributed shared virtual memory consistency with novel check-in/check-out semantics.


1 Introduction

Distributed synchronization is an important activity that is required to coordinate access to shared
resources in a distributed system. A set of n processors synchronize their access to a shared resource
by requesting an exclusive privilege to access the resource. The privilege is often represented as a
token. Real time systems, or systems that have critical tasks that must execute quickly for good
performance, need prioritized synchronization. In priority synchronization, every request for the
token has a priority attached. When the token holder releases the token, it should be given to the
processor with the highest priority request.
The processes synchronize by sending and interpreting messages according to a synchronization
protocol. We assume that every message that is sent is eventually received, but that messages can
be received out of order.
In this paper, we present a distributed priority synchronization algorithm that requires O(log n)
bits of storage per processor (the O(log n) bits are required to store the names of 0(1) processors).
The algorithm uses the path compression technique of Li and Hudak [11] to obtain a theoretical
O(log n) messages per request. Our performance results show that O(log n) messages are required

*We acknowledge the support of USRA grant #5555-19 and NSF grant DMS-9223088









in practice. The low space and message passing overhead make it scalable and practical for imple-
mentation.
Considerable attention has been paid to the problem of distributed synchronization. Lamport
[10] proposes a timestamp-based distributed synchronization algorithm. A processor broadcasts its
request for the token to all of the other processors, which reply with a permission. A processor
implicitly receives the token when it receives permissions from all other processors. Ricart and
Agrawala [16] and Carvalho and Roucairol [2] improve on Lamport's algorithm by reducing the
message passing overhead. However, all of these algorithms require O(n) messages per request.
Thomas [17] introduces the idea of quorum consensus for distributed synchronization. When a
processor requests the token, it sends a vote request to all of the other processors in the system.
A processor will vote for the critical section entry of at most one processor at a time. When a
processor receives a majority of the votes, it implicitly receives the token. The number of votes
that are required to obtain the token can be reduced by observing that the only requirement for
mutual exclusion is that any pair of processors require a vote from the same processor. Maekawa
[12] presents an algorithm that requires 0(/n-) messages per request and O(/n log n) space per
processor. Kumar [9] presents the hierarchical quorum consensus protocol, which requires O(n63)
votes for consensus, but is more fault tolerant than Maekawa's algorithm.
Li and Hudak [11] present a distributed synchronization algorithm to enforce coherence in a
distributed shared virtual memory (DSVM) system. In DSVM, a page of memory in a processor is
treated as a cached version of a globally shared memory page. Typical cache coherence algorithms
require a home site for the shared page, which tracks the positions of the copies of the page. The
'distributed dynamic' algorithm of Li and Hudak removes the need for a fixed reference point that
will locate a shared page. Instead, every processor associates a pointer with each globally shared
page. This pointer is a guess about the current location of the page. When the system is quiescent,
the pointers form a tree that is rooted at the current page owner.
When a processor faults on a non-resident page, it sends a request to the pointed-to processor.
Eventually, the page is returned and the faulting process unblocks. The request for the page follows
the chain of pointers until it reaches a processor that owns the page (or will own the page shortly).
If a processor owns a page and receives a request for it, the processor services the request by
returning the page and setting its pointer to the new page owner. If a processor is requesting a
page and receives a request for the page, the request is blocked until the processor receives and
uses the page. If a processor receives a request for a page, and neither owns nor is requesting the
page, the processor forwards the request and changes its pointer to the requestor (who will soon
own the page).
Though a request for a page might make n 1 hops to find the owner, the path compression
that occurs while the hops are being made guarantees that a sequence of K requests for the page









requires only O(n + K log n) messages. However, the blocking that the algorithm requires incurs a
O(n) space overhead, to store the identities of the blocked requests.
Raymond [15] has proposed a simple synchronization algorithm that can be configured to require
O(log n) storage per processor and O(log n) messages per critical section request. The algorithm
organizes the participating processors in a fixed tree. The execution of the algorithm is similar
to that of the Li and Hudak algorithm. Woo and Newman-Wolfe [18] use a fixed tree based on a
Huffman code. Because the tree is fixed, however, it does not adapt to the pattern of requests in the
system. Often, only a small population of the processors make requests for the token. Processors
that do not request the token should not be required to take part in the synchronization. Li and
Hudak show that their path compression algorithm requires O(n + Klog q) messages if only q of
the n processors use the page.
Some work has been done to develop prioritized critical section algorithms. Goscinski [5] has
proposed a fully distributed priority synchronization algorithm. However, this algorithm requires
O(nlog n) storage per processor and O(n) messages per critical section request. Recent work on
multiprocessor priority synchronization algorithms has focused on contention-free algorithms, with
algorithms proposed by Markatos and LeBlanc [13], Craig [4], and Johnson and Harathi [6].
Our priority synchronization algorithm uses the path compression technique of Li and Hudak
to achieve low message passing overhead. To avoid the O(n) storage cost of blocking, the algorithm
uses distributed lists to block process externally instead of internally.
Some work has been done on distributed lists, primarily in the context of directory-based cache
coherence algorithms. For example, the Scalable Coherent Interface (SCI) [8] uses a distributed
queue to chain together all of the processors that are requesting access to a memory block. The
algorithm is greatly simplified because the pointer to the head of the list is stored in a standard
place (the home memory block). Translated to a distributed algorithm, the SCI algorithm requires
a manager at a fixed site to remember the head of the list. In a path compression algorithm, no
such fixed-site manager is available. We note that Li and Hudak found that their path compression
algorithm had far superior performance to algorithms that required fixed site managers. A shared
memory synchronization algorithm that is quite similar in nature to the SCI algorithm is the
contention-free lock of Mellor-Crummey and Scott [14]. The recent contention-free priority locks
are based on the Mellor-Crummey and Scott lock.
The contribution of this paper is to present a practical distributed priority synchronization
algorithm that requires only O(log n) storage per processor and O(log n) message per synchroniza-
tion request, where n is the number of processors. To obtain this performance, we make a novel
use of distributed lists to transfer the burden of remembering which processes are blocked to the
blocked processes themselves. Previous (non-prioritized) distributed synchronization algorithms
that achieve O(log n) messages per synchronization request and O(log n) bits of storage per proces-









sor require action on the part of processes that are not involved in the synchronization. Previous
dynamic path compression algorithms require O(n) storage. In DSVM algorithms, the owner of a
page releases it very quickly, so little blocking occurs and the large potential storage overhead is
not considered a problem. New DSVM algorithms use check-in/check-out semantics [7] and block-
ing might become significant. Furthermore, the techniques described in this paper can be used
to support DSVM efficiently with novel sharing semantics. Thus, the techniques described in this
work can be adapted to improve non-prioritized distributed synchronization also.


2 Basic Design

Our starting point is the path compression by Li and Hudak. Translated to distributed synchro-
nization, their algorithm works as follows. Every process has a pointer, currentdir that initially
points in the direction of the current token holder. When processor A decides that it needs the
token, it sends a request to the processor indicated by currentdir. If a processor that neither holds
nor is requesting the token receives the request from A, it forwards the request to the processor
indicated by its version of currentdir and then sets currentdir to A. If a processor that holds or
is requesting the token, receives A's request, stores the request and takes no further action. This
process is illustrated in Figure 1. When a processor releases the token, it checks to see if there are
any blocked requests. If so, the processor sends the token to one of the requestors, sets its version
of currentdir to the new requestor, and unblocks any remaining blocked requests. If there are no
blocked requests, the processor holds the token until a request arrives, and sends the token to the
requestor (with a corresponding update to currentdir.
In a non-prioritized lock, it is permissible to block a request at a process that is requesting the
token because the blocker should get the token first anyway. In priority synchronization, every
request has a priority attached. The blocked request might have a higher priority than the blocker.
So, a process must be able to find the set of all current requestors to register its request.
Thus, all requesting processes must know about each other. Since we are allowed only O(log n)
storage, the processes can only use pointers to form lists. In this algorithm we make the requesting
processes form a .r'i, /',./i "'. All processes point to the next lower priority process, except for the
lowest priority process which points to the highest priority process. Thus, in addition to knowing
the identity of your successor, a task in the waiting ring also knows the priority of its successor's
request. The token holder must be able to release the token into the waiting ring, so it points to a
process in this ring. The processes that are not requesting the token might make a request, so they
also lie on paths that point to the ring. The structure of the synchronization is shown in Figure 2.
The algorithm executes as a distributed protocol. When a process decides to ask for the token,
it sends its request to the processor indicated by its forwarding pointer. Eventually, the processor
will receive the token and enter the critical section. A processor will receive unsolicited messages,














E token holder

token holder
A's blocked request



C D



B






requests token


Figure 1: Execution of Li and Hudak's path compression synchronization.


and will treat them as events to be handled. Event handling will in general cause a change to the
local state variables and often cause messages to be sent. A picture of the processor architecture is
shown in Figure 3. The unprocessed events are stored in a pending event queue. As will become
clear, there are times when a processor cannot correctly process an event in the event queue. Since
we want for the algorithm to use O(log n) storage, we can block the processing of only a finite
number of events at any time.
To simplify the presentation, we assume that all priorities are unique. Non-unique priorities can
be made unique by attaching a timestamp. In addition, the algorithm can be modified to handle
non-unique priorities directly.
The processors handle two classes of events: those related to requesting the token, and those
related to releasing the token. Although these activities have some subtle interactions, we initially
describe them separately.


2.1 Requesting the Token

Each processor p that is not holding or requesting the token stores a guess about the identity of
the token holder in the local variable currentdir. Note that each processor has its own version of
currentdir.
When a processor decides that it needs to use the token, it sends a Token-Request to the

























priority=




priority 12


priority 14 ,'


Waiting Ring

Figure 2: Structure of the distributed priority synchronization.


processor indicated by currentdir (if the processor already has the token, the processor can use it
directly). When a processor that neither holds nor is requesting the token receives a Token-Request
message, it forwards the request to the processor indicated by currentdir. After receiving the
message, the processor knows that the requestor will soon have the token, so the processor changes
currentdir to point to the requesting processor. This will continue until the Token-Request
message reaches the token holder or a process that is in the waiting ring.
When a Token-Request message reaches a process, P, in the waiting ring, the correct position
in the ring must be found for the requestor. If the requestor's priority is between that of P and P's
successor, or if P is the lowest priority process in the ring and the requestor's priority is lower than
P's or greater than P's successor, then the requestor should follow P. P changes its ring pointer

(we can re-use currentdir) to point to the requestor, and sends the requestor a Request-Done
message, indicating the requestor's successor in the ring. When the requestor receives the Request-
Done message, it sets its value of currentdir to its successor in the waiting ring. Otherwise, P
sends the Token-Request message to its successor in the ring.
If a Token-Request message arrives at the token holder and the token holder is not using the
token, the token holder replies to the requestor with the token. If the Token holder is using the
token, then there might or might not be a waiting ring. If the waiting ring exists, the message is
forwarded into the waiting ring. Otherwise, the token holder replies with a Request-Done message


































Figure 3: Architecture of the protocol.


indicating that the requestor is the only process in the waiting ring.

2.2 Releasing the Token

If there are no processors waiting to acquire the token, the token holder sets an internal flag to
indicate that token is available. Otherwise, the token holder releases the token into the waiting
ring. When a processor receives the token, it cannot immediately use the token, since the processor
does not know if it is the highest priority processor. Instead, the processor passes the token to
its successor. The lowest priority process knows that it is the lowest priority process (because
its successor has a higher priority) and therefore that its successor is the highest priority process.
Thus, the lowest priority process marks the token before passing it to its successor. When a process
receives a marked token, it accepts the token and enters the critical section.
After new token holder accepts the token, the waiting ring structure must be repaired. The
token holder sends the address of its successor to its predecessor (the process that forwarded the
token), which updates its currentdir pointer. Exceptions occur if the new token holder was alone
in the waiting ring, or if the previous token holder released the token directly to the new token
holder.
If we require a processes in the waiting ring to point to the processor with the next higher priority
request, it is the highest priority process that knows its status. However, the ring still needs to be









repaired, and the token holder still needs to update the currentdir pointer in its predecessor. In
general, finding the predecessor would require a circuit of the waiting ring, incurring a high message
cost.


3 Implementation Details

The essential operation of the algorithm is as described in the previous section. However, there
are a number of details that must be addressed to ensure the correct execution of the algorithm.
The complications occur primarily because of two concerns: out-of-order messages and the O(log n)
storage requirement.

3.1 Out of Order Messages

Out of order messages occur when a message arrives that was not expected. Usually, the problem
is due to non-causal message delivery. That is, processor A sends message mi to processor B, then
sends message m2 to processor C. Processor C receives In2 and sends mn3 to B. At B, message
mn3 arrives before message mi. For example, a requesting processor can be given the token before
being told that it is part of the waiting ring. This problem usually occurs when a new processor
is admitted to the waiting ring. If A has successor B in the waiting ring, then admits C into the
ring as its new successor, the first message from C to B might arrive before the last message from
A to B. In our example, the last message from A to B tells B that it is in the waiting ring, and
the first message from C to B is the token. Considerable work has been done on implementing
causal communications [1], but this work requires that all messages are broadcast, which in general
requires O(n) messages.
The messages that need to be processed in order involve waiting a processor's participation in
waiting ring maintenance. The out of order reception can be detected (i.e., you are given the token
before entering the waiting ring), and the processing of the too-early message can be blocked until
the appropriate predecessor message arrives. There can be only 0(1) messages that can arrive too
early, so delaying their processing does not impose too large a space requirement. Because the
protocol must handle non-causal messages, it also correctly handles messages from a processor that
are delivered out of order.

3.2 No Blocking

There are many occasions when a processor, A, cannot correctly process a message that has arrived
(as discussed above). If the unexpected message involves A's state, then the message processing
can be safely blocked because there can be only 0(1) such messages. However, the message might
be a request from a different processor, B. Since many processors might send their request to A,
processor A must be able to process these requests immediately.









In the algorithm by Li and Hudak, a processor blocks requests from other processors when
it holds the token or is requesting the token. In our algorithm, the token holder does not block
requests, instead it tells the requestors to form a waiting ring. Once a requesting processor joins
the waiting ring, it helps the new requests to also join the waiting ring. However, during the time
that a processor, A, requests the token and joins the waiting ring, it cannot handle requests from
other processors. The problem is that during this time, A's forwarding pointer curentdir does not
have a significant meaning. Handling the requests of others will cause cycles among the non-ring
processors.
Since processor A cannot handle foreign requests and cannot block these requests internally,
processor A will block them externally. During the time that processor A is requesting the token
but has not yet been told that it is in the waiting ring, processor A will respond to token requests
by linking them into a blocking list. The blocking list is managed as a LIFO. When a request
arrives, Processor A responds with a Block message, with the address of the previous head of the
blocking list (or a null pointer if the list is empty). When processor A joins the waiting ring, it
sends an Unblock message to the head of the blocking list. The Unblock message is relayed down
the chain of blocked processors. After unblocking, the processors resubmit their requests to A.

3.3 Miscellaneous

When a processor accepts the token, it sends a message out to repair the ring. The ring must be
repaired before the token can be released back into the ring. Therefore, when the ring is repaired
an acknowledgement is sent to the token holder. The token holder blocks its release of the token
until the acknowledgement that the ring is repaired has been received.
When a processor is in the waiting ring, it does not block new processors from entering the
ring. In particular, the lowest priority processor will admit new processors into the ring during
the time between sending a marked token to the highest process and receiving a request to repair
the ring. Only the processor that points to the token holder can repair the ring, so the request is
passed along in the waiting ring until it reaches a processor that points to the token holder. This
processor repairs the ring and sends an acknowledgement to the token holder.


4 The Algorithm

In this section we present the code for the algorithm that we have described in the previous two
sections.
The protocol uses the following variables:

Boolean tokenhldr True iff. the processor holds the token
Boolean incs True iff. the processor is using the token









Booleanin_ring
Boolean isrequesting


Boolean changed_acked
Boolean areblocking
Boolean areblocked


Integer currentdir
Integer blocked_dir
Integer unblockdir
Integer id


Real
Real


priority
linkpri


True iff. the processor knows it is in the waiting ring.
True when the processor has made a token request but hasn't joined
the waiting ring.
True if the waiting ring has been repaired.
True if the processor is blocking token requests.
True if you are in a list of blocked requests.


Direction of the successor processor.
Head of a list of blocked requestors.
Next blocked requestor.
Name of the processor (never changes).


The priority of the processor's request.
Priority of the successor's request.


The protocol is specified by specifying how events are handled. We assume that each event is
handled atomically at the processor where it occurs, except where a specific condition for waiting
is specified. A processor makes an event occur at a remote processor by sending a message. The
parameters of the send procedure are:
send(processor id, event; parameters for the event).
A message is sent to cause the specified event to occur on the remote processor, and the event
is passed the parameters.
The protocol driver takes events off of a queue and calls the appropriate routine to handle them.
The events may be due the the receipt of messages, or may be caused internally.

DPQ&handler()
while(l)
get an event from the event queue
call the appropriate routine to process the event

When a processor needs to use the token, it generates a REQUEST_TOKEN event. We note
here a particular use of the currentdir variable. If the processor holds the token and currentdir
points to the processor, then there is no waiting ring and hence no blocked processors. If currentdir
points to a remote processor, that processor is part of the waiting ring.

REQUEST_TOKEN(request _priority)
priority=request _priority
if(tokenhldr) // If you hold the token, use it.
incs=TRUE
change_acked=TRUE // The ring doesn't need repair.









currentdir=id
in_ring=FALSE
else
send(currentdir,RECEIVE_RQST; id,priority)
isrequesting=TRUE



A processor enters the waiting ring when it receives a REQUEST_DONE message. Later, the
processor will receive the token. Since the processor can handle token requests now, it will wake
up any blocked requests.

REQUEST_DONE(successor,successor_priority)
in_ring=TRUE
currentdir= successor
link_pri=successor_priority
isrequesting=FALSE

if(areblocking) // Wake up blocked requests, if any.
send(blocked_dir,UNBLOCK; id)
areblocking=FALSE



Most of the work of the protocol is in the RECEIVE_RQST event, which handles remote requests
to obtain the token. The handling of the event depends on the state of the processor (holding the
token, using the token, in the waiting ring, requesting but not in the waiting ring, not requesting).
Note that the protocol as written requires a processor to enter the waiting ring before receiving the
token, so two messages must be sent if the critical section is idle. This can be optimized by using
a single special message.

RECEIVE_RQ ST(requestor,requestor_pri)
if(tokenhldr and !incs) // Send the token.
currentdir=requestor
send(requestor,REQUEST_DONE; requestor,requestpri,id,TRUE)
send(requestor,TOKEN; id,FALSE,TRUE)
tokenhldr[id]=FALSE

else if(tokenhldr and incs)
if(currentdir==id) // No waiting ring, so create one.
linkpri=request_pri
currentdir=requestor
send(requestor,REQUEST_DONE; requestor,requestpri,id,TRUE)
else // Waiting ring exists, so forward the request there.
send(currentdir,RECEIVE_RQST; requestor,request_pri)

else if(in_ring)









if(priority=requestpri) // new lowest priority processor
send(requestor,REQUEST_DONE; currentdir,link_pri)
currentdir=requestor
link_pri=request_pri
else if((request_pri<=priority and request_pri>link_pri) or
(prioritylink_pri)) // found position in the ring
send(requestor,REQUEST_DONE; currentdir,link_pri)
currentdir=requestor
link_pri=request_pri
else // Not the right position, so forward the request.
send(currentdir,RECEIVE_RQST; requestor,requestpri)

else // Not token holder, not in waiting ring.
if(!isrequesting) // not requesting, so forward the request.
send(currentdir,RECEIVE_RQST; requestor,requestpri)
currentdir=requestor
else // Can't forward the request, so block it.
if(areblocking) // First blocked process.
send(requestor,BLOCK; blocked_dir,FALSE,id)
else
send(requestor,BLOCK; NULL,TRUE,id)
areblocking=TRUE
blockeddir[id] =requestor



When the token holder releases the token, it sends the token to the waiting ring, if it exists. If
no waiting ring exists, then currentdir will be pointing to the token holder.

RELEASE_TOKEN()
wait until changed_acked is TRUE

change_acked=FALSE
incs=FALSE
if(currentdir!=id) // There is a blocked process.
send(currentdir,TOKEN; id,FALSE,FALSE)
tokenhldr=FALSE



This event unblocks the release of the token.

CHANGE_ACK()
change_acked=TRUE


The lowest priority process tags the token. When a processor receives the token, it accepts the
token only if the token is tagged. As an optimization, the process can accept the token if it is









alone in the waiting ring. After accepting the token, the token holder points to the processor that
sent the token, which is likely to be the lowest priority processor in the waiting ring. Finally, the
waiting ring is repaired.

TOKEN(sender,tag,direct)
wait until in_ring is TRUE

if(tag or currentdir==id)
tokenhldr=TRUE
incs=TRUE
if(currentdir==id) // If you are alone, don't need to repair the ring.
change_acked=TRUE
else
if(!direct) // If the token holder didn't send the token, the
sender is probably the lowest priority process.
current dir= sender
send(currentdir,CHANGE_LINK; id,currentdir,link_pri)
in_ring=FALSE
else
if(priority send(currentdir,TOKEN,id,TRUE,FALSE)
else
send(currentdir,TOKEN,id,FALSE,FALSE)



This event repairs the ring after a new process accepts the token.

CHANGE_LINK(tokhldr,successor, succ_pri)
wait until in_ring is TRUE

if(currentdir==tokhldr)
currentdir= successor
link_pri=succ_pri
send(tokhldr,CHANGEACK)
else
send(currentdir,CHANGE_LINK; tokhldr,successor,succ_pri)

A requesting processor blocks if its request is forwarded to another requesting processor that
has not yet joined the waiting ring. The blocked processors form a LIFO list, which removes the
need for O(n) storage per processor.

BLO CK(next _blocked,islast,blocker)
unblock _dir= next _blocked
lastblocked=islast
currentdir=blocker
areblocked=TRUE









When a requesting processor joins the waiting ring, it unblocks the head of the blocking list,
which unblocks its successor and so on.

UNBLOCK(unblocker)
wait until areblocked is TRUE

if(!lastblocked) Relay the unblocking.
send(unblockdir,UNBLOCK; unblocker)
current dir [id]= unblocker
send(currentdir,RECEIVE_RQST; id,priority) // Resubmit your request.
isrequesting=TRUE
areblocked=FALSE


5 Correctness

In this section, we give some intuitive arguments for the algorithm's correctness. We loosely refer
to events as occurring at a point in 'time'. While global time does not exist in an asynchronous
distributed system, we can view the events in the system as being totally ordered using Lamport
timestamps [10], and view a point in time as being a consistent cut [3].
We note that all processors that are not requesting the token lie on a path that leads a processor
that either holds or is requesting the token. This property can be seen by induction. We assume
the property holds initially (and this is required for correctness). The property can change if a
processor modifies its currentdir pointer, or if the processor it points to changes its state. A
processor that is not requesting the token will change its currentdir pointer if it relays a request.
But then, it points to the requesting processor. A non-requesting processor can also change its
pointer if it sends the token to another processor, but currentdir is set to the new token holder.
A processor can change its state from not requesting to requesting, but the property still holds.
Finally, a processor can change its state from holding the token to not-requesting. But, after
changing state the processor points to the new token holder or to a requesting processor.
The token is not lost because it is only released to processors in the waiting ring (and a processor
must enter the waiting ring before accepting the token). The token is released to the highest priority
process in the waiting ring at the time that lowest priority process in the ring handles the token.


6 Performance

To test the performance of the distributed priority lock, we wrote a simulation of the algorithm. The
simulation modeled a set of processed that communicate through message passing. The parameters
to the simulator are the number of processors, the message transit delay, the message processing
delay, the time between releasing the token and requesting it again (the inter-access time), and the









time that a token is held once acquired (the release delay). All delays are exponentially distributed.
The priority of a request is an integer chosen uniformly randomly between 1 and 10,000.
We ran the simulator for varying numbers of processors and varying loads, where we define the
load to be the product of the number of processors and the release delay divided by the inter access
time. For each run, we executed the simulation for 100,000 critical section entries. We collected
the number of hops a token request makes, and the number of hops that the token makes before it
is accepted by a new token holder.
Figure 4 shows the average number of hops that a request must make before the requesting
processor joins the waiting ring. If the load is less that one (so that the token experiences idle
time), the number of request hops experiences a logarithmic growth (with a base of about 3). The
number of hops required to register a request is smaller if the load is 1 or larger than if the load
is less than 1. This result is seems counter-intuitive, because the waiting ring becomes large when
the load is large, and might a request to make many hops. However, when the waiting ring is
large, there are usually many requests in it with a low priority. The lowest priority processor will
remain the lowest priority processor for a long period of time. After a processor releases the token,
it points to the lowest priority processor. When the processor submits a new request, it needs only
1 hop for its request to reach the waiting ring. The request is likely to have a higher priority than
most of the requests in the waiting ring. Since the request arrives at the lowest priority processor,
it does not need to search through those requests.
Figure 5 shows the average number of hops that the token must make to move from the old
token holder to the new token holder. If the load is less than 1, the token can often be released
directly to the requestor. If the load is 1 or greater, there is usually a waiting ring. Even though
the protocol can require that the token make many hops to reach the new token holder, these
cases rarely occur. This phenomenon is shown in Figure 6. When the load is light, the token must
occasionally make an extra hop. However, this extra overhead is balanced by the occasions when
the token can be released directly to the requestor. When the load becomes large, the token rarely
needs to make an additional hop because the lowest priority processor is rarely replaced.
The synchronization algorithm occasionally requires that a processor delay its release of the
token. If this delay is a common occurrence, it can slow down system processing by increasing the
effective critical section time. Figure 7 shows that even when the critical section is short and the
load is high, token releases are blocked only ,. of the time.


7 Conclusion

We have presented an algorithm for prioritized distributed synchronization. The algorithm uses the
path compression technique of Li and Hudak for fast access and low message passing overhead. We
make a novel use of distributed lists to obtain a low storage overhead at each processor. The O(log n)










Number of Request Hops vs. Number of Processors


Number of Hops
6-


5-
6 ........................ ................. . .




4- ........ -.. . ......




2 . . .. . . .... . . . . .. ..


-Load=.5
+ Load=.75
- Load=1
* Load=2


I I I I I I I I I
0 20 40 60 80 100 120 140 160
Number of Processors


Figure 4: Request overhead.


message passing overhead per request and the O(log n) bits of storage overhead per processor make
the algorithm scalable. We present simulation results that show the performance of the algorithm
is good in practice.
The techniques presented in this paper can be applied to additional synchronization structures,
such as non-prioritized locks, reader/writer locks, and barriers. In our future work, we will examine
these applications.


References

[1] K. BIRMAN, A. SCHIPER, AND P. STEPHENSON, L '"l/*llr l ',l/ causal and atomic group multi-
cast, AC'.I Trans. on Computer Systems, 9 (1991), pp. 272-314.

[2] 0. CARVALHO AND G. ROUCAIROL, On mutual exclusion in computer networks, Comm. of
the AC':.I 26 (1','.;), pp. 146-147.

[3] K. CHANDY AND L. LAMPORT, Distributed snapshots: Determining ghial states of distributed
systems, AC..I Transactions on Computer Systems, 3 (1',",.), pp. 63-75.

[4] T. CRAIG, Q.,, ,', "./ spin lock alternatives to support timing predictability, tech. rep., University
of Washington, 1993.










Number of Token Hops vs. Number of Processors


Number of Hops


2 ..... X
2 - ~- ***-^ - X---- X---- ---- K -----
S......... -.. ..... ...... ............ +

1.5 ...... .................. Load=.5
+ Load=.75
Load=1
1
Load=2

0.5 ....................................


0 I I I I I I
0 20 40 60 80 100 120 140 160
Number of Processors

Figure 5: Token hops per critical section entry.


[5] A. GOSCINSKI, Two ,,1,'ll.,11-' for mutual exclusion in real-time distributed computer systems,
The Journal of Parallel and Distributed Computing, 9 (1990), pp. 77-82.

[6] K. HARATHI AND T. JOHNSON, A priority synchronization .,1,, '"1,1r for multiprocessors,
Tech. Rep. tr93.005, UF, 1991. available at ftp.cis.ufl.edu:cis/tech-reports.

[7] M. HILL, J. LARUS, S. REINHARDT, AND D. WOOD, Cooperative shared memory: Software
and hardware for scalable multiprocessors, AC'.1 Trans. on Computer Systems, 11 (1993),
pp. 300-318.

[8] D. JAMES, A. LAUNDRIE, S. GJESSING, AND G. SOHI, Scalable coherent interface, Computer,
23 (1990), pp. 74-77.

[9] A. KUMAR, Hierarchical quorum consensus: A new ,.1,,,"'i1r, for maiiuiiiimq replicated data,
IEEE Trans. on Computers, 40 (1991), pp. 994-1004.

[10] L. LAMPORT, Time, clocks, and the ordering of events in a distributed system, Communications
of the AC' .I 21 (1978), pp. 558-564.

[11] K. LI AND P. HUDAK, Memory coherence in shared virtual memory systems, AC'.1 Trans. on
Computer Systems, 7 (1','1), pp. 321-359.










Extra Token Transmisions per Request


Extra Token Transmissions
0 .1 4 . . . . . . . . . . . . . .. . . . . .. . . . . . . ..
+.
0 .1 2 -- ...........' ...................................:
++ -
"''.......
0 .1 .................. -........ --........ +
S--Load=.5
0 .0 8 -..... ... ............................ ............
\ + Load=.75

0.06\ .. ...........- .Load=1
0 .0 6 . . .. .. %% .. .. .. .. .. . .. .. . . . .. .. . .. .. .. .....
\ Load=2
0 .0 42 ........... ..........................................
0.04 -------
k,
0 .0 2 . . . . .. . .. " . . . . .. . . . . . . . . .. . ..
0.0
0.0------------ --N----;-- --

0 20 40 60 80 100 120 140 160
Number of Processors

Figure 6: Extra token transmissions per critical section entry.


[12] M. MAEKAWA, A sqrt(n) .1.,,, 'l1 ir for mutual exlcusion in decentralized systems, AC'.I Trans.
on Computer Systems, 3 (1',".), pp. 145-159.

[13] E. MARKATOS AND T. LEBLANC, Multiprocessor synchronization primitives with priorities,
tech. rep., University of Rochester, 1991.

[14] J. MELLOR-CRUMMEY AND M. SCOTT, Algorithms for scalable synchronization on shared-
memory multiprocessors, AC .1 Trans. Computer Systems, 9 (1991), pp. 21-65.

[15] K. RAYMOND, A tree-based .1.,,,"'i1r1 for distributed mutual exclusion, AC .I Trans. on Com-
puter Systems, 7 (1','"), pp. 61-77.

[16] G. RICART AND A. AGRAWALA, An optimal ,1i,., '11,,i for mutual exclusion in computer
networks, Comm. of the AC '.I 24 (1981), pp. 9-17.

[17] R. H. THOMAS, A illijr;t/I consensus approach to concurrency control for multiple copy
databases, AC .I Transactions on Database Systems, 4 (1979), pp. 180-209.

[18] T. Woo AND R. NEWMAN-WOLFE, Huffman trees as a basis for a dynamic mutual exclusion
,,1,l,, 1,,i for distributed systems, in Proceedings of the 12th IEEE International Conference
on Distributed Computing Systems, 1992, pp. 126-133.




























Probability of a Blocked Release vs. Number of Processors


Prob. of a Blocked Release

-- x -. )1( .

. . .. .. . . . . . . . . . . . . . . ... . . ... . ... ..


... ........ + .......... ....... ................L o

......+" Lo
S- Lo
+_ "-Lo
Lo






0 20 40 60 80 100 120 140 160
Number of Processors


Figure 7: Number of token releases that are blocked.


0.05


0.04


0.03


0.02


0.01


0


ad=.5
ad=.75
ad=1
ad=2




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs