Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: Non-blocking algorithms for concurrent data structures
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00095053/00001
 Material Information
Title: Non-blocking algorithms for concurrent data structures
Series Title: Department of Computer and Information Science and Engineering Technical Reports
Physical Description: Book
Language: English
Creator: Prakash, Sundeep
Fishwick, Paul A.
Johnson, Theodore
Publisher: Department of Computer and Information Sciences, University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: July 1, 1991
Copyright Date: 1991
 Record Information
Bibliographic ID: UF00095053
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Downloads

This item has the following downloads:

199012 ( PDF )


Full Text








Non-Blocking Algorithms for Concurrent Data

Structures


Sundeep Prakash Yann-Hang Lee Theodore Johnson
Dept. of Computer and Information Sciences
University of Florida
Gainesville, FL 32611



July 1, 1991


Abstract

Non-blocking algorithms for concurrent data structure guarantee that a da-
ta structure is always accessible, in contrast to blocking algorithms in which a
slow or halted process can render part or all of the data structure inaccessible to
other processes. In this paper, we first develop a method to design non-blocking
algorithms for any concurrent data structure, using the ,,1,, ..o' -,-,1i opera-
tion as the basic synchronization primitive. We use the example of queues to
demonstrate the method. In this general method, many processes are allowed
to concurrently access the data structure, but modifications to the data struc-
ture are made in serial order. We then deal with the problem of increasing the
achieved concurrency of access (number of modifications that can be simulta-
neously made) to the data structure. For increasing the achieved concurrency
in the general case, we give a method to convert any locking algorithm into a
non-blocking algorithm. This transformation is done by having the locks con-
tain the information required to complete the operation. This allows processes
blocked by a lock to complete the operation of the process holding the lock.
The achieved concurrency of the obtained non-blocking algorithm turns out to
be the same as the equivalent locking algorithm.



1 Introduction


Algorithms for concurrent access to shared data structures fall in two broad cate-
gories: blocking and non-blocking. Blocking algorithms are those in which a process









trying to read or modify the data structure isolates or locks part or all of the da-
ta structure to prevent interference from other processes [1, 2, 3, 4]. The problem
with the blocking approach is that in an asynchronous system with processes having
different speeds, a slower process might prevent faster processes from accessing the
data structure. Non-blocking algorithms, on the other hand, ensure that the data
structure is always accessible to all processes and an inactive process (temporarily
or permanently) cannot render the data structure inaccessible. Such an algorithm
guarantees that some active process will be able to complete an operation in a finite
number of steps [5], making the algorithm robust with respect to process failures.



1.1 Review of Current Work


Shared data structures are required in a number of multiprocessing applications. We
shall deal with lists in this paper, although the methods described can be easily ap-
plied to any data structure. Lists are interesting because they are required for many
operating system functions [2] and can be used to make more complex data struc-
tures, such as buddy system memory managers [6]. Non-blocking implementations
are desirable due to their robustness and continued fast operation in the presence
of processes with varying speeds. Many concurrent implementations are in existence
[2, 6, 7, 8] but non-blocking implementations are few. We give a brief description of
some important ones.

Lamport [9] gives a wait-free implementation for FIFO queues but restricts the
concurrency to a single enqueuer and a single dequeuer (an algorithm is wait-free if
every process can complete an operation in a finite number of steps [5]). Gottlieb et
al. [7] give a blocking algorithm for enqueuing and dequeuing using the replace-add
and swap operations for synchronization. This implementation allows a high degree of
parallelism limited only by the predefined maximum queue size. However, it is possible
for an enqueuer or dequeuer to block other dequeuers and enqueuers. Stone [8] gives
a 'non-delaying' implementation for the same, using the c ',,p ,1I, I' -..'p, operation for
synchronization, which allows an arbitrary number of enqueuers and dequeuers, but
it is possible for a faulty or slow enqueuer to block all the dequeuers. Herlihy and
Wing [10] give a non-blocking algorithm using the c,,n Ir ', -,'rp, operation. This
also permits an arbitrary number of enqueuers and dequeuers but is impractical as it
requires an infinite array size for continued operation. Herlihy [5] presents a general
methodology for automatically transforming sequential code for any data structure
into a concurrent non-blocking or wait-free implementation in which the memory
requirements for each concurrent process grow in proportion to the total number
of concurrent processes. Treiber [11] gives a non-blocking algorithm for concurrent
FIFO access to a shared queue. The enqueue operation requires only a single step but
the time taken for the dequeue operation is proportional to the number of objects in









the queue and is inefficient for large queue lengths and many simultaneous dequeue
attempts.



1.2 Synchronization Primitives


We use the c ',,,l,", 1 -.r,',p operation for all the algorithms presented in this paper.
As proved by Herlihy [12], it is impossible to design non-blocking or wait-free imple-
mentations of many simple data structures using other well known synchronization
primitives i.e. read, write, testiset, fetch Il./l and swap. However, the c,,rp'i, 'ii""'i
operation in its simple form has a standard difficulty (the A-B-A problem [13]). The
solution is to use the modified ct.,,rp'i, 'iu"p operation (also described in [13]), which
is what we have done. In the next section we describe the c I~np1', 'ti -r", operation,
the A-B-A problem and its solution.



1.3 The A-B-A Problem


We use the implementation of the c~'""n'," -.ry1' operation found in the IBM/370
architecture (the synchronization primitive proposed for the Cedar supercomputer
at the University of Illinois [14] has all the capabilities of cI 'I" '1 n; 1-.'l). It is a
three operand atomic instruction of the form CS(A,B,C). A, B and C are one word
variables.1 The instruction does the following:


If A equals C
then put B into C return condition code 0
else put C into A return condition code -1



It is used in the following manner: C is a shared variable and A is the private copy of
it made sometime earlier by a process. B is the value which it is attempting to put in
C. It is allowed to do so only if C has not been modified by some other process since
this process made a copy of it. If the attempt fails, the current value of the shared
variable is returned. We do not use this last feature in our algorithms.

The A-B-A problem occurs when C is the same as A even though C has been
modified a number of times since the process made a copy of it. A c&'"p'iu;' 'ip""'
operation performed by the process now will succeed which can cause errors to occur

1We do not give the exact location of the variables at the time of instruction execution as it is
not important here.









in many concurrent implementations of objects. To prevent this error a counter
is appended to C and is always incremented when a modification is made to C.
This leads to the c."I"'1'n1r' .' double operation, which is a five operand atomic
instruction of the form CSDBL(A1,A2,B1,B2,C) which does the following:


If Al 1|2 A2 equals C
then put B1 || B2 into C , return condition code 0
else put C into Al I| A2 , return condition code -1



Now, the shared variable C is twice the size of the other variables since one half of
C is used as a counter (we shall assume it to be the second half). A process first
reads the value of C and puts it in Al and A2 . It puts the desired new value of C
into Bl, assigns to B2 the value A2 +1, and then executes the CSDBL instruction.
Although the A-B-A problem can still occur, the probability is much lower [8] and is
acceptable for a large class of applications.

The rest of the paper is organized as follows: In Section 2, we present a method
for designing efficient non-blocking algorithms for any data structure, and we apply it
to lists by using the example of FIFO queues with interior access. All modifications
to the data structure are made in a serial fashion, and in Section 3 we discuss the
problem of increasing the concurrency in making these modifications. We describe
in this section, a method for converting any locking algorithm into a non-blocking
algorithm. The obtained non-blocking algorithm thus has the concurrency of the
locking algorithm, with the benefits of a non-blocking implementation. Section 4 has
the conclusion.



2 Designing Single Stream Non-Blocking Algo-

rithms


In this section, we present a simple strategy for designing non-blocking algorithm-
s for any data structure by serializing attempts to modify the data structure and
posting the operations of the successful attempt so that all processes can assist in its
completion. We call these algorithms single stream non-blocking algorithms. These
algorithms require a total of O(n) units of memory for all n concurrent processes.
2Denotes concatenation.









2.1 The General Method


Operations on a data structure essentially involve the reading and changing of some
pointers and data in it. We impose a condition that only the operation of a single
process may be done on the data structure at a time. When a process comes into
the system and observes that no operation is currently in progress (this information
is globally available), it proceeds to read the data structure including the parts it
wants to modify. If, when it finishes reading, there is (or has been) still no operation,
then it has successfully taken a "snapshot" (atomic read) of the data structure. Now,
it simply prepares a list of the addresses of the variables (pointers and data) to be
changed, their current values, and their intended new values and tries to post this
information at a common location as the operation to be executed. A number of
processes may attempt to post their own operations but only one succeeds. The
successful process can now proceed to complete its operation by actually making the
changes to the data structure. The unsuccessful processes are not blocked since they
can help complete the posted operation, and then continue their own operations.

The algorithm is non-blocking since all the processes can read the posted operation
(called the current operation) and attempt to change the pointers (and data) them-
selves. Each change is made by reading the old and new values of the variable from
the posted operation, and then making the actual change using the cO. 1'.l '' 'ir""p'
operation. Due to the nature of the c.' ;"pl' : -"'pr/ instruction, which is that only the
instruction with the correct current value of the variable can succeed in modifying
it, each change can be made only once. Repeated attempts (by other concurrent
processes) to make the same change simply have no effect. The first process to finish
the operation (i.e all the changes) just makes a globally visible indication that the
current operation is over, so processes can start trying to read (or reread) the data
structure. Processes finishing the operation subsequently will not be able to make any
such indication, and so cannot harm new operations posted at a later time. Again,
this is achieved using the c. ,,'ip , 1 "-r'p/ operation and shall become evident when we
describe an example in the following section. In this way, changes are serially made
to the data structure.


2.2 An Example: FIFO Queues with Interior Access

We assume a FIFO queue with interior access that allows four different operations.
We use a doubly linked list implementation and define the following operations:

* Enqueue at the tail

* Dequeue at the head









* Interior deletion of an object (a pointer to the object is given).


* Interior insertion of an object after another object (given pointers to both the
objects)


In both the interior insert and delete operations, we do the desired operation only
if the element is actually in the queue. Otherwise, the procedure implementing the
operation just informs the caller that the required element is in use somewhere. The
data structures for the queue are shown in Fig. 1. All pointers have a count along
with the actual pointer (in order to avoid the A-B-A problem and to be able to tell
if the pointer has been changed since we last read it). In a data structure in which
some data needs to be modified in some operation, even the variables containing the
data should have a count appended to them.

The head of the queue is pointed to by Head and the tail by Tail. Information
about the current operation is stored in an operation information block pointed to
by Current_op. An assumption made here is that the objects in the queue and
the blocks storing the operation information are never destroyed, but returned to a
shared pool (which can easily be maintained as a non-blocking stack [11]). We must
make this assumption because some slower processes may read an object even after
it is dequeued, or may try to read an operation information block even after it is no
longer current.

The enqueue procedure is shown in Fig. 2. In this procedure, the process first
makes a copy of Current_op. If it is not NULL then it completes the indicated
operation (this will soon be described), else it proceeds to read Head, Tail, Tail-
right and the address of Tail-- right. It makes private copies of all these variables
when it reads them. It then checks if Current_op is unchanged. If so, no operations
were performed on the queue while it was reading these pointers, so the process
makes a decision based on the values it read (which have been captured in private
variables). If the queue is empty, then both Head and Tail have to be shifted onto
the object. Otherwise, the object has to be appended to the object pointed to by
Tail and Tail has to be shifted to point to the new object. For each of these cases, it
puts the addresses of the variables to be changed, their initial values (which have just
been read) and the new values in its own private operation information block. The
process then tries to -.- g" Current_op to its operation (from NULL). If it succeeds
then its operation is guaranteed to execute. This is the point when an operation gets
committed to the data structure. The process now attempts to perform the operation.
Other processes may read Current_op and try to do the operation themselves. They
do this by first taking a snapshot of the operation by reading Current_op, copying
the current operation into a temporary private block, and checking to ensure that
Current_op is unchanged. If it is changed then the operation information block
that has just been read is no longer current, so the copy of it just made should be









discarded. If a valid copy of the current operation is made, then the process can
proceed to complete the operation itself (by reading the copy). However, since the
initial values of the pointers are already provided, and do not have to be read, and
due to the nature of the c~"l'"i' -".ry'' operation, each change to the data structure
can be made only once. Subsequent attempts have no effect. The first process to
complete the operation simply swings Current_op back to NULL. Again, since even
this modification is performed by the c.i ,,pli , ' 'ir1p operation, and Current_op had
been read before operation execution, only the first process to complete the operation
will succeed in changing Current_op to NULL.

The interior insert procedure is very similar. We assume that the procedure is
given a pointer to an object (objectptr) , and it is required to insert another object
(newobject) after this object (if it is present in the queue). Again, the process
reads Tail, objectptr- right, objectptr- left, objectptr- right- left and
the addresses of these variables (values of non existent variables are not read). After
getting these values (and ensuring that Current_op is unchanged) it makes some
decisions that would be made in a serial algorithm. If Tail (and consequently Head)
is NULL, or if objectptr- right and objectptr- left are both NULL but both
Head and Tail are not the same as objectptr, then the object is not in the queue,
so no insertion is to be done. The only other cases are if the object is not at the tail
or the object is at the tail. If the object is at the tail the new object has to be inserted
at the end of the linked list and the tail shifted to it. Otherwise, it has to be inserted
somewhere between two elements of the queue. Again, for each of these cases, it
prepares a list of the variables to be changed, their addresses, and their current and
intended new values and proceeds exactly as in the enqueue procedure. We do not
give code for the this and the remaining procedures since they are very similar to the
enqueue procedure.

In the interior delete procedure, the pointers (and addresses) that need to be read
are Head, Tail, objectptr- left, objectptr- right, objectptr- left- right
and objectptr- right- left. The cases that arise are:

* Either Tail is NULL (queue empty) or objectptr- left and objectptr-
right are NULL but Head.ptr is not the same as objectptr (object not in the
queue). The desired object is not in the queue, so no operation need be done.

* The object is the only object in the queue (Head = Tail = objectptr), so
both the head and tail need to be shifted.

* The object is at the tail (Tail = objectptr # Head), so the tail has to be
shifted in addition to delinking the object.

* The object is at the head, (Head = objectptr # Tail), so the head has to be
shifted in addition to delinking the object.









* The object is neither at the head nor at the tail (objectptr- left, objectptr-
right f NULL). The object only has to be delinked.


In each of these cases the addresses, initial values and intended values of the pointers
are put in the operation information block and the process continues in the same way
as before.

The dequeue procedure requires that Head, Tail, Head.ptr- right, Head.ptr-
right- left be read. Just as in the interior delete procedure, the queue may be emp-
ty (Head = Tail = NULL), or there may be a single element in it (Head = Tail
# NULL) or there may be more than one element in the queue (Head f Tail f
NULL). We do not describe the cases since they are simply a subset of the interior
dequeue procedure.


2.3 Correctness


First of all, we assume the correctness of the serial algorithm used to actually make
the decisions for modifying the data structure. To prove the obtained non-blocking
algorithms correct we consider a complete operation (e.g enqueue, dequeue) to consist
of three parts: reading the data structure and preparing the operation, swinging the
current operation pointer to our operation, and actually performing the operation.
We show the correctness of these algorithms by claiming that the operations done
by the concurrent process using these algorithms are decisive operation serializable
(concurrent actions are decisive operation serializable [15] if they are serializable [15]
and there is an operation dec(a) in each of the actions such that if dec(ai) occurs
before dec(aj) then ai comes before aj in the serial order).

The decisive operation that establishes the serial order in the above algorithm is
the swinging of the current operation pointer to a process' own operation. However,
this is not enough to prove that there is an equivalent serial order of the operations,
since more than one process may well attempt to do the third part of the operation.
It remains to be shown that concurrent attempts to complete a single operation
(the current operation) do not interfere with each other and with later operations.
This is seen to be true because we use the cOnr1'" ,' 'uirp.,, operation to perform each
individual change, and if the change is performed once, it can never be performed
again, so all subsequent attempts must not affect ("bounce off") the data structure.
So, no slow process can come after an operation has been completed and be able to
redo the operation. This also applies to the swinging of the current operation pointer
back to NULL: this can only be done once due to the nature of the c';";"n' " -""'
operation. We thus see that even though processes cooperate to do the third part
of the operation (committing it to the data structure), each change can in fact be









performed only once, which proves that there is in fact an equivalent serial order for
the modifications performed on the data structure.

In this proof, we do not take into account the A-B-A problem, which we assume
(throughout this paper) to have a very low probability of occurrence.


2.4 Analysis

The algorithms we obtained from this method have a s 1./i ,, !,I, I , .1 of O(n), assuming
n concurrent processes. System latency is defined by Herlihy in [5] as the maximum
number of steps a system can take without completing an operation. When one
process succeeds in swinging a pointer to its own operation, it can cause all remaining
n - 1 processes to reread the data structure (after helping complete the current
operation) and then prepare their operations all over again and then attempt to swing
the current operation pointer to their new operations. The memory requirement for
these algorithms is also O(n), where n is the number of concurrent processes. We do
not count the temporary private variables used by processes since they are released
by the process when it goes away (it does not amount to much either). We only count
the memory that is required to be continuously reserved for the data structure. This
memory consists of that occupied by the operation information blocks and elements of
the data structure. Since we never require more than a constant number of operation
information blocks and elements to be held by a process, the total reserve can be
limited by O(n) and can be kept in shared pools (one for the operation information
blocks and one for the elements) which can be maintained as non-blocking stacks (as
mentioned earlier).



2.5 Comparison with Herlihy's Methodology

Herlihy [5] gives a general methodology for transforming sequential code for any data
structure into a concurrent non-blocking or wait-free implementation (an algorithm
is wait-free if every concurrent process can complete an operation in a finite number
of steps [5]). Two protocols are presented for this purpose. The small object protocol
requires each process to have O(n) units of memory, where n is the maximum number
of processes that can be concurrently present in the system, and has O(n) system
latency if a unit time fetch II/,/ instruction is available. Also, this protocol requires
the whole data structure to be replicated each time an operation is done on it. The
large object protocol requires each process to have O(nm) units of memory, where
m is a factor depending on how many parts of the data structure have to be read
to perform the operation. The large object protocol is more time efficient since it









does not require the whole data structure to be replicated every time a modification
is made, as does the small object protocol. Both protocols allow modifications to be
made serially.

The method we have presented improves on that proposed by Herlihy since it has
a much lower memory requirement. This is achieved because, in Herlihy's method,
the data structure is not modified in place (as in our method) but copies of parts
of it are made, the required changes made to the copies, and then the parts are
swapped with the originals. However, this is not the sole reason for the increased
memory requirement. In order to get a view of the data structure, a process must
freeze parts of it, to prevent these parts from being used for making copies if they are
swapped out of the data structure. The amount of memory required to ensure that a
process always has enough memory to use for copies turns out to be very large in the
presence of many concurrent processes (as given above). Our method modifies the
data structure in place and thus avoids the extra memory requirement. The problem
of getting a consistent view of the data structure is countered by reading only when
no operation is in progress, instead of freezing parts of the data structure.

The method presented in this section does not fully exploit the concurrency al-
lowed in the data structure. For example, in a FIFO queue, when there are more
than one elements in the queue, dequeuing and enqueuing may proceed concurrently.
In the next section, we deal with this aspect.



3 Designing Multiple Stream Non-Blocking Al-

gorithms


In the method described in the previous section, we imposed the condition that only
one process may modify the data structure at a time. This was done by having a
single current operation. The algorithms of the previous section thus had an achieved
concurrency of 1 (although many concurrent processes could be present in the system).
This had the following advantages:


1. It is not necessary to put locks or other structures in the data structure, to
prevent clashes between concurrent modifications of parts of the data structure.

2. Since all processes read the data structure in between operations, they do not
see intermediate states of the data structure, so the algorithms for modifying it
are exactly the same as a serial algorithm for the same purpose.









Increasing the concurrency consequently involves preventing clashes and additional-
ly, modifying the algorithms to be able to handle intermediate states of the data
structure. The penalty we thus have to pay is in the complexity of the code.

In order to increase the concurrency, we need to identify where the concurrency
can be introduced. This is a data structure specific problem. For example, in a FIFO
queue implemented as a list, if we exclude the possibility of using concurrent atomic
operations e.g fetch-and-add, enqueuers must proceed serially and dequeuers must
proceed serially [16]. However, enqueuers and dequeuers can proceed in parallel if
there is more than one element in the queue. Otherwise, they must proceed serially.
So, whenever there are two or more elements in the queue, there can be two concurrent
streams, an enqueuing stream accessing the tail of the queue, and a dequeuing stream
accessing the head of the queue. Also, we need to put guards in the data structure to
combine these two streams into a single stream when there is only a single element
in the queue. Similarly, in the case of FIFO queues with interior access, in addition
to the enqueuing and dequeuing streams there can be as many interior accesses going
on as there are uninterfering processes wanting to do interior accesses.

We now present a general method to obtain multiple stream non-blocking algo-
rithms for those data structures which have a locking algorithm available. The locking
algorithm has the data structure specific features.


3.1 Making Locking Algorithms Non-Blocking


In this section, we describe a method for making locking algorithms non-blocking.
We assume the locking algorithm consists of three parts:

* Setting all the locks needed to modify a part of the data structure.

* After obtaining the locks, doing operations on the data structure. We assume
these operations consist of the changing of pointers and data in the locked part
of the data structure.

* Releasing all the obtained locks.

There may be some work done by the locking algorithm before the first part, for
example, traversing the data structure to reach the part that needs to be modified.
We do not make any assumptions about this part of the locking algorithm, since no
changes are made during this time. We are only concerned with the part that makes
the modifications.

We make the following assumptions about the locking scheme:









* Only exclusive locks are used.


* There is no possibility of deadlock, by locks being put in a cycle (this can lead
to livelock in the transformed non-blocking algorithm).


If we can get a locking scheme for the data structure that satisfies these properties,
then a non-blocking implementation with the same concurrency is guaranteed. For
example, the method described in the previous section is just this transformation
applied to a very simple locking scheme, one in which the whole data structure is
locked for every operation. That was a very conservative locking algorithm, so we
obtained single stream non-blocking algorithms. We can get multiple stream non-
blocking algorithms with locking schemes that take more advantage of the locality of
operations.



3.2 A Description of the Transformation


We assume that until the time comes to put the locks and actually change the da-
ta structure, the locking and non-blocking algorithms are identical. In the locking
algorithm, we then activate the locks in the parts of the data structure we want to
modify (or wait to activate them). In the non-blocking algorithm, instead of activat-
ing the locks we simply make copies of the current values of the locks (they may be
set). Locks are required to be implemented as a variable with a count and a pointer
(the use of the pointer shall become evident soon). The count is incremented by 1
whenever the lock is set or released, so that we can always find out if the lock has
been touched since we last read it, and to avoid the A-B-A problem.

In the next step of the locking algorithm, we would proceed to make changes
in pointers (and data too, but we use list algorithms as an example) in the locked
elements. In the equivalent non-blocking algorithm, we simply make private copies of
the addresses and the current values of those pointers. Again, we require the pointers
to have an attached count which will be incremented with each modification, for the
same reason as before. In the locking algorithm, we have the guarantee that no one
will touch the elements once we have locked them, so our operation is guaranteed
to be atomic. In the non-blocking algorithm, we guarantee the same atomicity by
reading the locks of the elements again, after finishing the read of the pointers. If the
locks are unchanged and not set, then no one had modified, or was in the process of
modifying, any of the pointers while we were reading them, so we have a complete
snapshot [16] of everything we need to change. If they are unchanged but set, then we
still have a valid snapshot, but we have caught the data structure in an intermediate
state and we complete the operation indicated by the locks that are set (the method
of doing this shall soon be described). If they are changed, we just start over again









until we get a snapshot. Now, we can prepare our operation privately. We do this
by putting the just read values of the locks (and their addresses), the pointers (and
their addresses), and the new values of the pointers, in an operation information block
(exactly the same as in the previous section, only that it holds more information). As
before, these operation information blocks, and elements of the data structure, may
not be destroyed after use, but must be returned to a shared pool.

We have now completed a mock run of the locking algorithm, except that the data
structure has not been touched. The operation is now ready to be committed to the
data structure. Notice that unlike in the two stream FIFO queue implementation in
[16], we do not have to identify unique intermediate states of the data structure for
each type of operation (e.g enqueue, dequeue in a FIFO queue). All intermediate
states occur while the elements are locked, which is where we find information to
complete the operation.

We now attempt to perform the operation we prepared. We start setting the locks
indicated in the operation information block. This can succeed only if all the locks
are as they were when we did the atomic read. So, if we succeed in putting all the
locks, we know that those elements have still not been touched, so we can correctly
do the operations indicated in the operation information block. If we fail, we start
from the beginning after releasing the locks just set.

In order to ensure that the algorithm is non-blocking we have to provide some
method of completing the operation of a process which is already holding some locks
but is currently inactive. This is achieved by having the pointer in the lock point
to the operation information block of the operation being performed. This way, the
moment a process sets a lock, its intentions are clear to every other process. In
the equivalent locking algorithm, only the process that set the locks could do the
operation. In the non-blocking algorithm, after the first lock is set, the operation is
visible to everyone (through the lock), and anyone being blocked by the lock will do
the operation i.e. they will attempt to set the remaining locks, make the changes
in the data structure, and release all the locks. When the blocked process tries to
complete the owner process' job, it must inform it of success or failure. In order to
do this, we include a variable, status, in the operation information block (which is
pointed to in the lock). The variable status also has a count attached to it (for the
same reasons) and initially, is set to NOOP. The count is untouched.

After the first lock is set by the owner process, other processes wanting to help
the owner take private snapshots of the operation information block of the owner
and try to execute the operation. The owner also tries to set all the locks (locking
phase), do the operations, and release all the locks (unlocking phase). While in the
locking phase, if it is unable to set a lock in its own favor, it checks to see if the lock
is already in its own favor (by checking if the lock is set and the lock pointer points









to its own operation information block). This is possible, because a faster process
may have read the owner's operation information and is ahead of the owner in the
locking phase. If the lock is in fact in the owner's favor, it continues locking and if it
gets through all the locks, it attempts to set its operation information block's status
to SUCCESSFUL (from NOOP). The operation is guaranteed of completion at this
point. As a matter of fact, the first process that gets through the complete locking
phase sets the status of the owners opinfo block to SUCCESSFUL. Similarly, the first
process that fails in setting any lock because it was in some other operations favor,
aborts the locking phase, sets the owners status to FAILURE (so that other processes
know what happened) and goes on to the unlocking phase. When it sets the owner's
status to FAILURE, it also puts the number of locks it succeeded in setting before
it failed, so that only those locks may be released. So, status has two parts: var
and count, and var is divided into code (NO_OP, FAILURE, SUCCESSFUL) and
num, which has the number of locks that were set. Also, if the process that set the
status to FAILURE had done so because some lock was in another operation's favor,
then that process must try to unblock that lock too, after completing the unlocking
phase here. This unblocking of locks is recursive.

The complete flow chart of the non-blocking algorithm is shown in Figs. 3-7.
We now describe an example of FIFO queues with interior access, implemented as a
doubly linked list.


3.3 An Example: FIFO Queues with Interior Access

The locking scheme on which this algorithm is based is very simple. We lock each of
the objects in the queue that we need to modify (except for the trivial case when the
queue is empty). The direction of putting locks is from head to tail for every operation,
to eliminate the possibility of deadlock. For example, in a dequeue operation, two
locks need to be placed, the element pointed to by the head, and then the element
next to it.

The data structures of the queue are shown in Fig. 8. We describe the interior
insert procedure and give the code for it (Fig. 9-11). The code for the remaining
procedures is very similar and is omitted.

In the interior insert procedure, objectptr points to the object in the queue to the
right of which we want to insert newobject. We first take a snapshot of Head, Tail,
objectptr- left, objectptr- right, objectptr- lockval, objectptr- right-
left and objectptr- right- lockval. In addition to capturing all the values of
the variables, we also read the addresses of these variables in the snapshot procedure.
Now, we can make decisions about the queue by looking at the captured variables.









One operation of the queue is done without locking. That operation is enqueuing
from an empty queue, because there is nothing to lock. This case is exactly the same
as the two stream FIFO queue. When the queue is empty, the first enqueue must be
done by first putting the tail to the object, and then putting the head to it. To make
this unique, we require that dequeuing the last object should require first shifting the
tail to NULL. So if the interior access procedure sees that the head is NULL but the
tail is not, it knows that this is an incomplete enqueue and it finishes the enqueue
by shifting the head to the object. Next, we check to see if any of the objects we
need are already locked. If the lock of the the object pointed to by objectptr and
that of the object to its left are already set, we raise Exceptioni to unblock those
locks. Now all the objects we need are unlocked. We now check to see if the object
is in the queue at all. If not, we restart. Note that except for the special case of an
empty queue enqueue, we make decisions only after we see that the objects we need
are unlocked. This is to ensure that we do not see the objects in an intermediate
state (in the middle of an operation) and make wrong decisions. Now we continue
and prepare the operation information for the various possibilities, which are:

* The object is the last element in the queue. In this case, the tail will have to
be modified too (to point to newobject).

* It is not the last in the queue. Instead of the tail being modified, we must set
objectptr- right- left to newobject and newobject- right to objectptr-
right.

After preparing these operations, the rest of the procedure is the standard procedure
(described in the above subsection, and in the flow charts). We attempt to put
the locks, raise Exception 1 if we are blocked while putting the first lock and raise
Exception if we are blocked while putting subsequent locks which we find are not in
our favor.

The enqueue procedure is just a subset of the interior insert procedure, except that
we have to handle the empty queue case. We do that without locking, as mentioned
before. The variables we need to read in this case are Head, Tail, Head-- left,
Head-- lockval. As in the interior insert procedure, we must lock the new object
too, because the moment it even partially enters the queue, other processes might
attempt to lock it.

In the interior delete procedure the variables that have to be read are Head,
Tail, objectptr- left, objectptr- right, objectptr- left-- right, objectptr-
right-- left, objectptr+ lockval, objectptr+ left-- lockval, objectptr+ right-
lockval. The cases that could occur are:

* The object is not in the queue









* The object is the only element in the queue


* The object is at the tail

* The object is at the head

* The object is somewhere else.


Again, for each of the cases, we simply prepare the operations and proceed as before.

The dequeue procedure is simply a subset of the interior delete procedure. In
that, the variables we need to read are Head, Tail, Head-- right, Head-- right-
left, Head-- lockval, Head-- right-- lockval. The cases that could occur are:


* The queue is empty.

* Only one element in the queue.

* More than one elements in the queue.


We proceed just as in the previous cases.



3.4 Correctness of Transformed Algorithms


First of all, we assume that the operations defined for the data structure are correct.
For example, we assume that the enqueue, dequeue, interior insert, and interior delete
operations (of the FIFO queue with interior access) are correct when used by a single
process to modify the queue (in the absence of all other processes, and without any
locking scheme). The correctness of the obtained non-blocking algorithms is seen
since the algorithms observe the following safety properties:


* No modification of any part of the data structure is made without an operation
obtaining all the locks to every part of the data structure it needs to modify
(except possibly for some trivial cases e.g the empty queue enqueue in the
previous example which is easily proved correct).

* A lock cannot be set if it was changed after it was last read.

* Each action within an operation (e.g the changing of a pointer, releasing a lock,
changing the status in an operation information block) can be done exactly
once.









and the following liveness property:


* The only way a process can be stopped from performing an operation is another
process performing an operation.


The first and second safety properties ensure that the defined operations (e.g enqueue,
dequeue in a FIFO queue) are atoc,,.'. illi executed in an environment of concurrent
processes. Assuming that we have read the data structure correctly (in the FIFO
queue with interior access, this translates to the snapshot being correct), we can
commit the operation to the data structure only if the part of it we want to modify
has not changed since we last read it, because only then can we lock it. No on else
can do their operations on it while we are doing our operation, because of the first
safety property. However, this simply proves the locking algorithm correct. In the
non-blocking scheme, the difference is that any process may do the operation for
the owner once the operation becomes public. The third safety property takes into
consideration this point by claiming that each action of an operation can be done only
once. So once an operation is completed (successfully or unsuccessfully), no process
which had made a copy of the operation earlier can do anything to the data structure.

The liveness property simply guarantees that some work will actually be done.
This is because we assumed no deadlock, so it is impossible for operations to stop
each other in a cycle. Some operation must complete, so the system of concurrent
processes as a whole must progress.

The proof of the first safety property is simple, since the only way a modification
will be made is when the status of the operation in the owners operation information
block indicates success. The only way status can be set is if all the locks indicated in
the operation are set, either by the owner, or by other processes.

The proof of the second property is evident because we use the cO".1'n' 1 "-".
instruction to modify all shared variables.

The third property is also obeyed because we use c""ln'" " -i"."1' to modify any
shared variable. So, all operations can be done only once. Also, the status in the
owner's operation information block can be changed only once. This is because the
owner keeps a copy of the status before making its operation information public, and
uses that initial value when trying to change it using the o';11"I' I '. -ir,' instruction.
Other processes, when they take a snapshot of the owner's operation information block
read the copied status, and are in a position to go back and change the owner's status
only if they copied the initial status (they use the c" iI'1;;'; , i'i] operation too).
This ensures that the status of the owner's operation information cannot be changed
after the operation is over (and cause problems when the operation information block









is being used for another operation). The locks too can be set and released only a
single time, because we make the initial values of the locks available to every process
through the operation information block, and they use these values while doing a
cO,,,p,.i, .' ti' to modify the locks.

The liveness property is easily proved by contradiction. If no other process is
changing the part of the data structure that one process is attempting to modify,
then it must succeed in locking it, and so must be able to complete its operation.



3.5 Analysis

The obtained algorithms are evidently non-blocking since a process can always help
out another which may be blocking it. Just as in the single stream non-blocking
algorithms, these algorithms have a system latency of O(n), assuming n concurrent
processes. When a process gets a lock, it may cause all remaining processes to reread
the data structure (after helping it finish its operation). The achieved concurrency
of the algorithm is evidently the same as the equivalent locking algorithm. Also,
the total memory requirement for the algorithms is O(n) since each process needs a
constant number of operation information blocks, and objects.

There is one notable characteristic of the obtained non-blocking algorithms that is
different from the single stream non-blocking algorithms and the two stream algorithm
in [16]. The point at which the operation is committed to the data structure is not
always executed by the owner process. This point is the putting of the final lock.
This may create problems if the process fails and is not able to recover all its state.
However, the problem can be easily alleviated by putting an additional variable in
the operation information block which indicates if the operation has been prepared
or not. Once the operation has been prepared, this should be marked as TRUE
(initially it was false). Now, if a process fails, all it needs to do is to recover its
operation information block. If the operation has been not been prepared, it can do
whatever it wants. If it has been prepared, it must try to execute it and only when
it finishes can it decide on whether or not the operation has already been committed
to the data structure.



4 Conclusion


In this paper, we have presented methods for designing single and multiple stream
non-blocking algorithms for concurrent data structures. These algorithms have a
system latency of O(n) and a total memory requirement of O(n), where n is the









number of concurrent processes. We have concentrated on algorithms for lists, though
the methods can be applied to other data structures as well.

Directions for future work include:


* Performance evaluation of the presented multiple stream non-blocking algo-
rithms, and a comparison with equivalent locking algorithms.

* Looking into methods of making these algorithms wait-free, since non-blocking
algorithms cannot prevent starvation of individual processes.

* Application of the presented methods to other data structures especially trees.

* Extending the transformation of locking algorithms to non-blocking algorithms
to a general locking algorithm, not just a locking algorithm for a data structure.



References

[1] C.S. Ellis and T.J. Olson. Concurrent Dynamic Storage Allocation. In Proceed-
ings of the 1987 International Conference on Parallel Processing, pp. 502-511.

[2] S.F. Hummel. SMARTS - Shared-memory Multiprocessor Ada Run Time Super-
visor. Technical Report 495, NYU, February 1990.

[3] Y. Mond and Y. Raz. Concurrency Control in B+-Trees Databases using Prepara-
tory Operations. 11th International Conference on Very Large Databases, pp.
331-334, August 1i'".

[4] P. Tang, P.-C. Yew, C.-Q. Zhu. A Parallel Linked List for Shared-Memory Mul-
tiprocessors. In Proceedings of 13th Annual International Computer Software &
Applications Conference, pp. 130-135, September 1989.

[5] M. Herlihy. A Methodology for Implementing Highly Concurrent Data Struc-
tures. In Proceedings of the .',,l ACM SIGPLAN on the Principles and Practice
of Parallel Programming, pp. 197-206, March 1989.

[6] J. Wilson. Operating System Data Structures for Shared-memory (MIMD) Ma-
chines with Fetch-and-Add. Ph.D Thesis, NYU, 1'-"

[7] A. Gottlieb, B.D. Lubachevsky, and L. Rudolph. Basic Techniques for the Effi-
cient Coordination of Very Large Numbers of Cooperating Sequential Processors.
ACM Transactions on Programming Languages and Sl. ,,, 5(2):164-189, April
IPl- ;









[8] J. M. Stone. A Simple and Correct Shared-Queue Algorithm Using Compare-
and-Swap. In Proceedings of the IEEE Computer S..,. / i and ACM SIGARCH
Supercomputing '90 Conference, November 1990.

[9] L. Lamport. Specifying Concurrent Program Modules. ACM Transactions on
Programming Languages and Si, , ,,", 5(2):190-222, April l'i 1

[10] M. Herlihy and J. Wing. Axioms for Concurrent Objects. In 14th ACM S'l'i ...
sium on Principles of Programming Languages, pp. 13-26, January 1987.

[11] R. Kent Treiber. Systems Programming: Coping with parallelism. IBM Almaden
Research Center, San Jose, California, RJ 5118, April 1'II.

[12] M.P. Herlihy. Impossibility and Universality Results for Wait-Free Synchroniza-
tion. Seventh ACM SIGACT-SIGOPS Si,,'i' .'. ,, on Principles of Distributed
Computing, pp. 276-290, August 1'l"

[13] IBM T.J. Watson Research Center, Yorktown Heights, New York, System/370
Principles of Operations, pp. 7-13,14, May 1'i ;

[14] C.-Q Zhu and P.-C. Yew. A Synchronization Scheme and its Applications for
Large Multiprocessor Systems. Proceedings of the 4th International Conference
on Distributed Computing Sill, ,,, pp. 4�"h-493, May 1984.

[15] D. Shasha and N. Goodman. Concurrent Search Structure Algorithms. ACM
Transactions on Database Sill, ,,- 13(1):53-90, March -l"'

[16] S. Prakash, Y.H. Lee, and T. Johnson. A Non-Blocking Algorithm for Shared
Queues using Compare-and-Swap. To appear in 1991 International Conference
on Parallel Processing, St. Charles, Illinois, August 1991.






















Structure




Structure


Structure


Structure


object_pointer
ptr
count

opinfo_pointer
ptr
count


object
data
left
right


opinfo
number
address[MAX]
oldval[MAX]
newval[MAX]


pointertoobject
int



pointer_to_opinfo
int



Structure_of_data
object_pointer
object_pointer


int
address_of_object_pointer
object_pointer
pointertoobject


Shared Variables
Head
Tail
Current _op


object_pointer
object_pointer
opinfo_pointer


Figure 1: Data Structures for the Single Stream FIFO Queue











Procedure Enqueue(objectptr)


Private Variables
my_op
mycurrentop
tempop
opexec
objectptr
headcopy
tailcopy
rightcopy
rightaddr


pointer toopinfo
opinfopointer
opinfo_pointer
opinfo_pointer
pointertoobject
objectpointer
objectpointer
objectpointer
addressofobjectpointer


Assume the object to be enqueued is pointed to by objectptr


= FALSE


Read Currentop into mycurrentop
If (mycurrentop.ptr = NULL) then
Read Head into headcopy
Read Tail into tailcopy
If (tailcopy.ptr ' NULL) then
Read Tail.ptr- right into rightcopy
Read address of Tail.ptr- right into rightaddr
If (mycurrentop = Read(Current op)) then

No one touched the queue while this process was reading the
variables, so the read was atomic

If (tailcopy.ptr = NULL) then

Queue is , ill*/l. so prepare to do an ,i ,'lil queue enqueue

information to shift head to object

myop- address[1] = address(Head)

Figure 2: Enqueue Procedure for Interior Access FIFO Queue


success -
repeat










my_op- oldval[1] = headcopy
my_op- newval[1] = objectptr

information to shift tail to object

my_op- address[2] = address(Tail)
my_op- oldval[2] = tailcopy
my_op- newval[2] = objectptr

my_op- number = 2

else

there are one or more elements in queue

information to shift tail

my_op- address[l] = address(Tail)
my_op- oldval[1] = tailcopy
my_op- newval[1] = objectptr

information to shift Tail.ptr-- right to object

my_op- address[2] = rightaddr
my_op- oldval[2] = rightcopy
my_op- newval[2] = objectptr

information to shift objectptr-- left to Tail

my_op- address[3] = address(objectptr- left)
my_op- oldval[3] = objectptr-- left
my_op- newval[3] = tailcopy.ptr

my_op- number = 3

Now the operation is ,i.uil.. so try to make it current
so that we can continue executing it

cc = CSDBL (mycurrentop.ptr, mycurrentop.count, my_op,

Figure 2 continued










mycurrentop.count+1, Current_op)


if (cc = SUCCESS) then success = TRUE

Make a ..',, of the opinfo pointed to by Current_op

Read Current_op into tempop
If (tempop.ptr / NULL) then
copy info into block pointed to by opexec
If(tempop = Read(Current_op)) then

Execute operations

for i = 1 to opexec-- number do
CSDBL(oldval[i].ptr, oldval[i]. count, newval[i],
oldval[i].count+ 1, address[i])

Now set Current_op to NULL, indicating op completion

CSDBL(tempop.ptr, tempop.count, NULL,
tempop.count+1, Current_op)

until success

end Procedure Enqueue



Figure 2 continued























































Go to Start NO NO



Prepare operation
privately










Fi e 3 Flow Chart of NonBlockin Alorithm: Part 1




Figure 3: Flow Chart of Non-Blocking Algorithm: Part 1














Q


Go to Start


Exception, then go to
Start


Figure 4: Flow Chart of Non-Blocking Algorithm: Part 2





















EXCEPTION 1
(Completes operations
of processes holding
our locks)


status = SUCCESSFUL


Figure 5: Flow Chart of Procedure Exceptioni: Part 1





















NO OP


Figure 6: Flow Chart of Procedure Exceptioni: Part 2

































EXCEPTION 2
(Handles cases when
one or more locks have
been successfully put,
and then no more can
be put)


Figure 7: Flow Chart of Procedure Exception2














object_pointer
ptr
count

opinfo_pointer
ptr
count

lockstruc
lock
consists of lock.l
ptr


pointertoobject
int



pointer_to_opinfo
int


and lock.n: 1 is the lock, n is the count
pointer_to_opinfo


Structure






Structure


csvar
var int
we divide var into var.code and var.num.
code has the actual status and num holds the number of locks
count int


object
data
lockval
left
right


Structure_of_data
lockstruc
object_pointer
object_pointer


opinfo
num_oflocks
num_of_vars
status
address[MAX]
oldval[MAX]
newval[MAX]
lockval[MAX1]
lockaddr[MAX1]


int
int
csvar
address_of_object_pointer
object_pointer
pointertoobject
lockstruc
address_of_lockstruc


Figure 8: Data Structures for the Multiple Stream FIFO Queue


Structure




Structure


Structure


Structure










Shared Variables
Head
Tail

Private Variables
my_op
newobject
objectptr


objectpointer
objectpointer



pointer_to_opinfo
pointertoobject
pointertoobject





Figure 8 continued









Procedure Interior_Insert(objectptr, newobject)

newobject is to be inserted to the right of objectptr

First get an opinfo block from the stack
this will be used to store the operation information
pop_nb_stack(my_op)

Take a snapshot of the head, tail, the initial values of
the locks and the the initial values of the pointers to be changed
in the object (pointed to by objectptr) and the object
to the right of objectptr

1: Snapshot (Head, Tail, objectptr-- left, objectptr-- right,
objectptr-- lockval, objectptr-- right--+ left,
objectptr-- right-- lockval)

the snapshot is taken by reading each of the variables and then reading
them again to ensure that '1, l; are unchanged and the read was atomic.
The values of the variables are returned in ,,ii l.ead, i,,i~_tail, obj_l,
obj_r, obj_lock, obj_r_l, obj_r_lock, when a successful atomic
read occurs. In addition, the addresses of the locks and pointers are
returned in a_[thevariable]

If we find that the first element is being enqueued, complete the
enqueue
If (my_head.ptr = NULL and my_tail.ptr ' NULL)
CSDBL (my_head.ptr, my_head.count, my_tail.ptr, my_head.count +1,
Head)
Start over again from 1

Check to see if any locks are i,,1, .,il set. If so raise an exception
so that the operation indicated by the lock is executed
bool = FALSE
If (objJock.l = LOCKSET)
Exceptionl(a_obj _lock.l)
bool = TRUE


Figure 9: The Interior Insert Procedure









If (obj4_lock.l = LOCKSET)
Exceptionl(a_obj _rlock.l)
bool = TRUE

If (bool = TRUE)
Start over again from 1

Check to see if the object (objectptr) is in the queue at all
If (objJ.ptr = NULL and objr.ptr = NULL and
Tail / objectptr)
return OBJECT_NOT_IN_Q

Prepare the opinfo block

Put initial values and addresses of locks in opinfo block
my_op-- lockval[1] = objiock
my_op- lockaddr[1] = a_obj_lock

Even newobject will need to be locked since we don't want people
changing it until it is (n./,1,I 1h l, in the queue
my_op-- lockval[2] = newobject--+ lockval
my_op-- lockaddr[2] = addressof(newobject-- lockval)
my_op- num_of_locks = 2

If (obj4.ptr = NULL)

This is the case when objectptr is the last object in the list

my_op- num_of_locks = 2

Now put in the initial and new values of the variables that need
to be changed

Tail will now point to newobject
my_op-- oldval[1] = mytail
my_op-- newval[1] = newobject
my_op-- address[1] = a_mytail

objectptr-- right now points to newobject
my_op-- oldval[2] = objr

Figure 9 continued











my_op-- newval[2] = newobject
my_op- address[2] = a_obj_r

newobject- left now points to objectptr
my_op- oldval[3] = newobject-- left
my_op- newval[3] = objectptr
my_op- address[3] = addressof(newobject-- left)
my_op-+ num_of_vars = 3

else

In this case the object is somewhere in the middle of the queue

We will need to lock objectptr-- right too
my_op- lockval[3] = obj_rlock
my_op- lockaddr[3] = a_objr_lock
my_op-+ num_of_locks = 3

put in the initial and new values of the variables that need
to be changed

objectptr-- right now points to newobject
my_op-- oldval[1] = objr
my_op- newval[1] = newobject
my_op-+ address[1] = a_obj_r

newobject-- left now points to objectptr
my_op-- oldval[2] = newobject-- left
my_op-- newval[2] = objectptr
my_op-+ address[2] = addressof(newobject-- left)

newobject-- right now points to objectptr-- right
my_op-- oldval[3] = newobject-- right
my_op-- newval[3] = objr.ptr
my_op-+ address[3] = addressof(newobject-- right)

objectptr- right- left now points to newobject
my_op-- oldval[4] = objr_l

Figure 9 continued











my_op-- newval[4] = newobject
my_op-- address[4] = a_obj_rJ
my_op- num_of_vars = 4

Now that we have made a mock run of the algorithm, and all required
data is l,,, #,l.i in the opinfo block, try to commit the changes by
attempting to put the locks

First initialize and keep a .',/, of the status
my_op- status.var = NO_OP || 0
tempstat = my_op-- status
Note that we did not touch the count of the status. It is initialized
olii at startup time, and never again. It is olii incremented

Put first lock
cc = CSDBL (my_op-- lockval[1].lock, my_op-- lockval[1].ptr,
LOCK_SET I| my_op- lockval[1].lock.n + 1, my_op,
*(my_op- lockaddr[1]))

If we fail in putting first lock, then nothing has gone public ,r I.
so we need not put any indication in i,,i _,lp- status or unlock
any locks. We just need to raise the exception which attends to the
operation indicated in the offending lock

If (cc # SUCCESS)
templock = *(my_op-- lockaddr[1])
If (templock.ptr f my_op)
0li. if the lock that blocked us is still in place do we need to
execute the exception
Exceptionl(my_op- lockaddr[1])
Start over again from 1

Now we can attempt to put the remaining locks. Note that someone
else ,,i., ,il doing it for us (if blocked by our first lock)
for i = 2 to my_op- num_of_locks do
cc = CSDBL (my_op-- lockval[i].lock, my_op-- lockval[i].ptr,
LOCK_SET | my_op-- lockval[i].lock.n + 1, my_op,
*(my_op- lockaddr[i]))

Figure 9 continued












If (cc f SUCCESS)
templock = *(my_op- lockaddr[i])
If (templock.ptr my_op)
raise an exception. Send the address and the value
of the offending lock to the exception procedure
If (Exception2(templock, my_op-- lockaddr[i], my_op,i)= SUCCESS)
return
else
start over again from 1

Now, all the locks are set in our favor. This guarantees that our
operation will 1,, 7,.'/, li; be executed, so put that indication
in the status

CSDBL (tempstat.var, tempstat.count, SUCCESSFUL |my_op- num_of_locks
tempstat.count+1, my_op-- status)
Note that someone else i,,in. have put the success indicator for us,
if he put our locks faster than we could

Now, .*',,1.,l;i do the operations indicated in our opinfo block
for i = 1 to my_op- num_of_vars do
CSDBL (my_op-- oldval[i].ptr, my_op-- oldval[i].count,
my_op-- newval[i], my_op-- oldval[i].count+1,
*(my_op- address[i]))

release locks
for i = 1 to my_op- num_of_locks do
cc = CSDBL (LOCKSET I| my_op- lockval[i].lock.n +1,
my_op, LOCK_NOT_SET I| my_op- lockval[i].lock.n +2,
NULL, *(my_op- lockaddr[i]))

return opinfo block to the shared stack
push_nb_stack(my_op)

end Procedure InteriorInsert




Figure 9 continued









Procedure Exceptionl(lockaddr)


We have to unlock the lock, so make a private ..', ' of the opinfo
of the operation that locked it

succ = FALSE
repeat
tempcopy = *(lockaddr)
If (tempcopy.lock. = LOCKSET)
Copy the block pointed to by tempcopy.ptr into private block
pointed to by tempop
else
return
If (tempcopy = *lockaddr)
succ = TRUE
until succ

the above took a snapshot of the opinfo while the lock still had a
pointer to it. If the lock was unlocked at any time, we just return.

At this point t ,,i/ .... i/ ptr points to the actual opinfo block and tempop
to our . 'i,' of it

Check the status of the operation

If (tempop-- status.var.code = SUCCESSFUL)

This operation has successfully put all its locks. So, .',,,.l;
do its operations and unlock its locks

for i = 1 to tempop- num_of_vars do
CSDBL (tempop-+ oldval[i].ptr, tempop-+ oldval[i].count,
tempop-- newval[i], tempop-- oldval[i].count+1,
*(tempop-- address[i]))

for i = 1 to tempop-- num_of_locks do
cc = CSDBL (LOCKSET I| tempop-- lockval[i].lock.n +1,
tempcopy.ptr,
LOCK_NOT_SET I| tempop- lockval[i].lock.n +2,
NULL, *(tempop-- lockaddr[i]))

Figure 10: Procedure Exceptioni












else if (tempop-- status.var.code = FAILURE)
.',,l*,1;, remove the locks that have been put and return
for i = 1 to tempop- status.var.num do
cc = CSDBL (LOCKSET I| tempop- lockval[i].lock.n +1,
tempcopy.ptr,
LOCK_NOT_SET I| tempop-- lockval[i].lock.n +2,
NULL, *(tempop- lockaddr[i]))

else
the locks are still being put
Help in putting them

for i = 1 to tempop- num_of_locks do
cc = CSDBL (tempop- lockval[i].lock, tempop-- lockval[i].ptr,
LOCK_SET I| tempop- lockval[i].lock.n + 1, tempcopy.ptr,
*(tempop- lockaddr[i]))

If (cc # SUCCESS)
Check if lock is ii7,1,,Il, in our favor

tmp = *(tempop- lockaddr[i])
If not(tmp.lock.l = LOCKSET and
tmp.lock.n = tempop- lockval[i].lock.n +1 and
tmp.ptr = tempcopy.ptr)

If we find that the lock is not ,,1, .,lil in our favor,
then either someone has finished the operation successfully
and set the status indicator and is now unlocking, or this
lock has been claimed by a third /'., I

set the status indicator to FAILURE. This will olii succeed
in the second case (of previous comment)
CSDBL (tempop-- status.var, tempop-- status.count,
FAILUREI|(i - 1), tempop- status.count +1,
tempcopy.ptr--+ status)


Figure 10 continued











unlock the locks that have been set
newstatus = tempcopy.ptr- status
If (newstatus.count = tempop- status.count +1)
for i = 1 to newstatus.var.num do
cc = CSDBL (LOCKSET I| tempop+ lockval[i].lock.n +1,
tempcopy.ptr,
LOCK_NOT_SET I| tempop+ lockval[i].lock.n +2,
NULL, *(tempop+ lockaddr[i]))

if we were the ones to find the lock in someone else's favor
then we have to pursue the matter further
If (tmp.lock.l = LOCKSET and
tmp.ptr f tempcopy.ptr)
Exceptional (tempop-- lockaddr[i])

return

we have now set all the locks, so set indicator to success

CSDBL (tempop-- status.var, tempop-- status.count,
SUCCESSFUL I| tempop-- num_of_locks, tempop-- status.count +1,
tempcopy.ptr-- status)

now do the indicated operations
for i = 1 to tempop-- num_of_vars do
CSDBL (tempop-- oldval[i].ptr, tempop-- oldval[i].count,
tempop-- newval[i], tempop-- oldval[i]. count+1,
*(tempop-- address[i]))

now unlock the locks
for i = 1 to tempop-- num_of_locks do
cc = CSDBL (LOCKSET I| tempop-- lockval[i].lock.n +1,
tempcopy.ptr,
LOCK_NOT_SET I| tempop-- lockval[i].lock.n +2,
NULL, *(tempop- lockaddr[i]))

end Procedure Exceptioni



Figure 10 continued









Procedure Exception2(lockval, lockaddr, my_op, k)


tempstat = my_op- status

Check to see if someone has completed our work
If (tempstat.var.code = SUCCESSFUL)
someone has ,1,, ., ,l completed our job and is unlocking the locks
so help in the unlocking and return
for i = 1 to my_op- num_of_locks do
cc = CSDBL (LOCKSET I| my_op- lockval[i].lock.n +1,
my_op, LOCK_NOT_SET I| my_op- lockval[i].lock.n +2,
NULL, *(my_op- lockaddr[i]))

return SUCCESS
else
tempstat.var.code is FAILURE or NOOP. Try to mark the operation
as failed then proceed to unlock
If (tempstat.var.code = NO_OP)
CSDBL (tempstat.var, tempstat.count, FAILUREI|(k - 1),
tempstat.count + 1, my_op-- status)

for i = 1 to my_op-- status.var.num do
cc = CSDBL (LOCKSET I| my_op- lockval[i].lock.n +1,
my_op, LOCK_NOT_SET I| my_op- lockval[i].lock.n +2,
NULL, *(my_op- lockaddr[i]))

Now, if lockval.lock.l was equal to LOCK_SET, then we
still have to try to unblock the offending lock
If (lockval.lock.l = LOCKSET)
Exception (lockaddr)

return NOSUCCESS

end Procedure Exception2



Figure 11: Procedure Exception2




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs