The Performance of Holding Versus Releasing Locks in a
Multiprogrammed Multiprocessor
Theodore Johnson Krishna Harathi
Dept. of Computer and Information Science
University of Florida
Abstract
In a multiprogrammed multiprocessor system, existing lock based mechanisms hold the lock for a
task during a context-switch. In this case, the time that a task holds a lock can be greatly increased,
resulting in a wasted lock utilization and an increase in the average response time for tasks using the
lock. An option for avoiding this problem is to release the lock held by a task during a context-switch.
The lock is reacquired when the task executes in the next quantum. In this paper, we compare the
performance of releasing versus holding a lock during a critical section. We discuss our implementation,
the ICSM-R algorithm, as implemented on a multiprocessor system. Next, we develop an analytical
performance model, and validate it with a simulation. We study the various parameters under which
the ICSM-R algorithm outperforms lock-based algorithms. We find that if the critical section execution
time is 50% or less of the time quantum, ICSM-R has better performance than locking alone, reducing
response times and improving scalability.
1 Introduction
Mutual exclusion is a significant problem for sharing a resource in a multiprogrammed shared-memory
multiprocessor system. Existing lock based synchronization mechanisms hold the lock for a task even after
a context-switch when the task is not executing. In such an event, tasks on other processors waiting for the
lock will be blocked for a length amount of time much larger than the critical section execution time. As
a result, the lock utilization and the response times are needlessly increased. This bottleneck will be more
pronounced as the number of processors and the degree of multiprogramming are increased. In spite of this
problem, multiprogramming is often used in shared memory multiprocessors, in order to improve processor
utilization or perform background tasks (such as asynchronous I/O).
One way to avoid this problem with multiprogramming is to require a task to release all of its locks
before a context switch. In this paper, we analyze the performance of the IC \I-R algorithm. In the IC'"\I-
R algorithm, a task releases its locks on a context switch. When the task is re-scheduled on the processor,
the lock is re-acquired and a test for a conflicting critical section access is performed. If a conflicting access
did complete, the task restarts its execution of the critical section. The IC,\I -I; algorithm does not block
tasks needlessly, but there is a penalty of restarting the critical section due to conflicting accesses by other
tasks. However, the critical section response time will be low if the number of restarts of the critical section
are small. We study the parameters and conditions under which the IC'\-I ; algorithm performs better
than an algorithm that does not release the lock during a context-switch.
There has recently been a great deal of interest in the problem of handling critical sections in a multipro-
grammed shared memory multiprocessor. Anderson et al. [2] show that a naive implementation of spin-locks
can not only delay the processor waiting for a lock, but other processors doing work. They suggest an
Ethernet-style backoff scheme or a queue-based algorithm for reducing the cost of spin-waiting. McCann et
al. [11] conclude that preempting processors in a coordinated way is critical to response times while using
critical sections. Anderson et al. [3] argue that the operating system should recognize that a preempted
thread is executing in a critical section, and execute the preempted thread until the thread exits the critical
section. An approach that is related to the IC'I -I; algorithm is the use of non-blocking algorithms. Ale-
many and Felton [1] consider implementation issues of non-blocking concurrent objects on shared-memory
multiprocessors. They show how the resources wasted by the non-blocking operations that fail and the cost
of data copying required by a non-blocking implementation can be reduced by relying on the operating sys-
tem support. Bershad [4] discusses two approaches for implementing kernel level support for non-blocking
critical sections.
Interest in pre-emptable locks has recently developed in the real time systems community. Takada and
Sakamura [16] proposed algorithms that extend queuing spin-locks to be preempted for servicing interrupts.
They address the conflicting issue of servicing a pending interrupt while holding a lock. Shu et al. [15]
proposed an Abort Ceiling Protocol, an extension to the Priority Ceiling Protocol [14]. In this algorithm, an
abort ceiling priority is associated with a task. Another task may abort the currently running task and run
immediately if its priority is higher than the current abort ceiling. The protocol relies on the Interruptible
Critical Sections to restart the critical section of the aborted task. The Ceiling Abort Protocol [17] proposed
by Takada and Sakamur is a similar extension to the Priority Ceiling Protocol. This protocol assigns an
abort ceiling priority to the critical section instead.
The contribution of this work is to make provide a detailed analytical performance model of both the
lock holding and the lock releasing strategies. Some previous works have pointed out the benefit of a lock
releasing strategy [3, 1], but did not provide an analytical model. By using the analytical model, we can
make a detailed comparison of lock holding and of lock releasing, and find that a lock releasing strategy is
better than a lock holding strategy if the critical section execution time is 50% or less of a time quanta.
2 The ICSM-R Algorithm
We introduced the idea of an Interruptible Critical Section (ICS) [9], which is a critical section protected
by optimistic concurrency control instead of by blocking. A task calculates its modifications to the shared
data structure, then attempts to commit its modification. If a higher priority task previously committed
a conflicting modification, the lower priority task fails to commit, and must try again (as in optimistic
concurrency control [5]). Otherwise, the task succeeds, and continues in its work. In [7], the ICS algorithm
is extended to the IC \1-R algorithm, which executes on multiprocessors.
In an interruptible critical section, a process can perform only one write that is visible to other processes.
Furthermore, the globally visible write must be the last instruction in the protected region. Therefore, a
process that is executing an ICS records its updates in a private buffer (the commit buffer). The final write
commits the updates that are recorded in the buffer by setting a commit flag. Any subsequent process that
executes the ICS performs the updates and clears the commit flag.
Related approaches to optimistic synchronization are discussed by Alemany and Felton [1] and by Bershad
[4]. However, these protocols require that if context switch occurs while a task is executing a critical section,
the operating system kernel must either complete the critical section execution or back out of the execution.
Requiring the kernel to perform the execution on behalf of the task has many difficulties. The task must
pass a great deal of information to the kernel to initialize a critical section, there is a great potential for
security problems and kernel corruption, the critical section execution might not terminate, and the lengthy
context switch can cause timing problems. By contrast, the IC'",\-I algorithm pushes most of the work of
implementing releasable critical sections to the user level. The code to support IC',\1- I; is easy to implement,
and we present our implementation results in the next section. Other authors suggest that all tasks in a
program be scheduled for execution simultaneously [11]. However, synchronizing the simultaneous context
switches can be difficult to implement, and does not account for context switches due to background tasks.
We discuss the IC,\I -I; algorithm as implemented in a VMEexec [13] system development environment
with a pSOS+ [12] real-time, multi-tasking operating system kernel. The VMEexec system consists of a
host running on a VMEmodule driven SYSTEM V/68 operating system and a set of VMEmodule target
processors running the pSOS+ kernel. In our configuration, we have six MVME147 VMEmodules based
on Motorola MC'G.' ..11 with 11i) of shared-memory on each module. One VMEmodule is used as a host
processor running the SYSTEM V/68 and the rest are real-time target processors running the pSOS+ kernel.
pSOS+ is a real-time, multi-tasking kernel that supports multi-processors. It provides a rich set of
system services including task management, shared-memory regions, synchronous / asynchronous signals,
semaphores, and messages. One particular feature that pSOS+ supports are user written routines that can
be called at the start of a task, during a context-switch, and at the end of a task. This feature allows us to
implement ICS support without modifying the kernel.
We implemented IC \ I- I; with spin-locks using the Test&Set instruction. Two data structures are needed
to implement IC'"\I-I; one for the critical section, and one for each task that uses the critical section.
The global lock structure consists of a critical section identifier, a counter that tracks the number of
times the critical section has been executed, and the critical section bounds. It also contains the address of
the global spin-lock variable. The structure local to a task consists of the copy of the ICS execution count,
a count of the number of times the critical section is retried on any invocation (for statistics), a pointer to
the ICSLstruct and a flag to indicate that the task is entering the critical section.
The ICS implementation code consists of two parts: The IC'\1- I;- 1:.-;-r routine which provides the ICS
Lock mechanism and the IC'\1-1 I;. i. I task that uses the ICS mechanism.
The IC' \1-I. 1:.-;-r routine is integrated with the pSOS+ kernel as a user written routine that is called
during a context-switch. The call occurs at the point where the context of the switched-out task has been
completely saved, and before the context of the switched-in task is loaded. pSOS+ provides the addresses of
the Task Control Blocks (TCBs) of both the switched-in task and the switched-out task in machine registers.
The TCB contains all the context of a task, including the Program Counter (PC). IC'",\-I; I:.- -r can reset
the PC in the TCB of a switched-in task, if required.
IC \ I- I;. :.- -r first checks if the program counter (PC) of the old task about to be switched out is within
the critical section region, and if so, it releases the spin-lock by setting the lock variable to zero. Next,
IC'",-I -I; :.--I routine checks if the new task about to be switched in is within the critical section region. If
so, it attempts to reacquire the spin-lock without spinning. If successful, IC' \ -I; 1:.- checks if there was a
conflicting operation in the interim. If so, it sets the task's PC to reexecute the critical section. If there is no
conflict, the task is allowed to continue where it left when it was switched out. If the attempt to re-acquire
the lock by the IC'\- I;- 1:.- -r routine is unsuccessful, the task is made to restart from the point of acquiring
the lock for the critical section.
Our discussion of how an IC'\I-1; is implemented is of necessity quite brief. We refer the interested
reader to our other reports for a more detailed discussion [9, 7].
3 ICSM-R Performance Analysis
The IC'"\-I ; protocol has the intuitively good property that a task does not hold a lock while it is idle
due to a context switch. However, a critical section might need to be executed many times due to conflicts
while it was swapped out. A performance study is needed to determine when using an IC'"\-I ; has better
performance than permitting a task to hold its locks while switched out. In addition, a performance study
is necessary to justify the effort of implementing an IC'\l-I;
We first study the performance of an implementation of the IC'"\ -I ; algorithm. We compared the
performance of IC'\I- I; with a spin-lock algorithm that does not release the lock during a context-switch
(LOCK-NR). As we are limited by the number of processors available for the implementation, and to better
understand the implications of various parameters, we constructed an analytical model. We validated the
analytical model by using discrete event simulation. We present the results of the experiments and the
analysis in the following sub-sections.
3.1 Experimental Performance Results
We implemented the ICK'\I-I; algorithm by integrating it into the pSOS+ kernel, and ran an experiment.
A global counter is protected by a critical section, implemented using a spin-lock. On each processor, we
run four tasks. Among these four tasks, only one task is the IC' -\Il I;. i. task that increments the shared
counter and the rest are dummy tasks. All the four tasks are started under the control of a low priority
parent task.
We compared the performance of IC(' \ I- I; with a spin-lock algorithm that does not release the lock during
a context-switch (LOCK-NR). Each processor executes four tasks, where each task works for Tw amount of
time. One task among the four tasks enters the critical section. It stays in the critical section for Tc time
units. We varied the number of processors from one to four. In our experiment, Tw and Tc are random
variables uniformly distributed by 20% about a selected mean. We set Tw to 20 milli seconds (ms). The
time quantum for a task is 20 ms for processor sharing among the four tasks. We varied the critical section
execution time Tc to reflect various load conditions with best-case lock utilizations of 1.25%, 6% and 10%
per processor.
The performance results are shown in Figure 1. Experimental results confirm that as the number of pro-
cessors increase, IC,\I- I; performs better than LOCK-NR. When the lock utilization is low, the performance
improvement is little as there is only a small probability that a context-switch occurs during a critical section.
As the lock utilization increases this probability is higher. Releasing the lock during a context-switch helps
tasks running on other processors to acquire the lock faster, thereby avoiding wasted cycles of spinning.
3.2 Analytical Performance of ICSM-R
The implementation results are encouraging, but incomplete. They do not provide a good explanation about
why IC', 1-I; is better than LOCK-NR, and do not permit extrapolations to new scenarios. To make up for
these deficiencies, we developed an analytical performance model of both IC'\l4-R and LOCK-NR.
For the analytical model, we consider N number of processors running M tasks each. Each task on a
processor works for T, units of time. In addition, one out of M tasks on a processor requests the service
of a critical section that takes Tc time units to execute. Each processor is shared among M tasks using a
time quantum of Tq time units in a round-robin fashion. The cycle time of a task is composed of the work
time and the critical section time, if any, of the task. After completing an execution cycle, a task repeats
the cycle. We are interested in estimating the cycle time of the task using the critical section.
We use the following notation for the analysis:
N : Number of processors
M : Number of tasks per processor
T, : Work time for each task
Tq : Time quantum for processor sharing
Tc : Critical Section execution time
Rp : Critical section utilization per processor
R : Total critical section utilization
X : The total work time in a cycle
Z : The total time spent in waiting for the critical section in a cycle
B : The CPU time spent holding the critical section.
E[Cx in Cs] : Expected number of context-switches while in critical section
CN : Cycle time for a task using NIC', 1-I;
CI : Cycle time for a task using IC,\1- I;
Critical Section: 1 milli second
1000
? 800
E 600
F-
C0
a 400
w
200
0
1000
? 800
E 600
F-
C0
a 400
200
0
Critical Section: 5 milli seconds
2 3
Processors
Critical Section: 10 milli seconds
1000 Lock Not Released -+--
? 800
C 400
200
00
1 2 3 4
Processors
Figure 1: IC I- 1 Performance Results (Tw = 20 milli seconds)
Figure 1: JCs\MI I Performance Results (Tw =20 milli seconds)
1 2 3 4
Processors
X Z B
Figure 2: Model of a cycle for a task using LOCK-NR
3.2.1 Analysis of algorithm LOCK-NR
In this model, the critical section is not released during a context-switch while the critical section is being
executed by a task.
A cycle of the task using the critical section on a processor is shown in Figure 2. A cycle consists of the
work time X, Z units of time waiting for the critical section and B units of time in the critical section.
We have,
CN = X+Z+B (1)
X = M*T, (2)
B depends on T, Tq and the number of context switches that are possible while holding the critical
section.
E[Cx in Cs] =-
B = T +E[CxinCs]*(M- 1)*Tq
= T,+ *(M- 1)*Tq
= M* T (3)
Assuming that a request for critical section is uniformly distributed in time, the probability of a conflict in
using the critical is just the percentage of time the critical section is being used. Making the approximation
that the critical section can be modeled as a M/M/1 queue, the expected blocking time Z is given by
R,
Z = ( )* B, (4)
where R, is the utilization of the rest of the N 1 tasks that use the critical section, given by
(N 1)
R, = R
N
X Z1 B1 Z2 B2
Figure 3: Model of a cycle for a task using IC\M-I-;
and B, is the residual life [10] of the lock holding time B for which the task under consideration is blocked.
Assuming that the lock holding time is uniformly distributed with a mean B and variance O- = O-T, B,
is given by
B B 22
B, = +
2 2*B
As only one task uses the critical section on a processor, the utilization per processor is given by
B
S= (X Z + B)
Then, the critical section utilization is
R = N*Rp
B
= N*
(X + Z + B)
B
S(X (l ) B,) + B) (5)
X and B can be computed using equations 2 and 3, respectively. We can compute R using equation 5
with iteration, setting the initial value of R to be zero. Knowing R and B, Z can be computed, and hence
the cycle time CN.
3.2.2 Analysis of algorithm ICSM-R
In this model, the critical section is released during a context-switch while the critical section is being
executed by a task. The critical section is re-acquired and continued by the task during the next quantum.
This acquire/release cycle is continued till the critical section is completed.
A cycle of the task using the critical section on a processor is shown in Figure 3. In this case, after
the work period X, there is a possible blocking time Z1 spent waiting for the critical section to be free.
Then there is the partial critical section holding time Bl. At this time the task may experience a context
switch and another blocking time represented by Z2. This acquire / release critical section is continued till
ZF BF
the critical section is completed. There is a possibility of restarting the task at the beginning of the critical
section whenever the critical section is re-acquired, if there is a previous commit in the critical section during
the time period when the critical section is last released and re-acquired.
We have,
C = X+Z+B (6)
X = M*T, (7)
where Z = Z1 + Z2 + ... + ZF and B = B1 + B2 + .. + BF. B depends on T, and the number of times
a task is restarted from the beginning of the critical section because of a commit by another task.
B = T + T, N,
where T, is the partial execution time of the critical section before a task is restarted because of a conflict,
and N, is the number of times the task is restarted because of a conflict before it commits. The expected
value of T, = T1/2. Given the probability of restart of a task within the critical section as Pr[CS Restart],
N, = i Pr[CS Restart]'
i>o
Pr[CS Restart]
(1 Pr[CS Restart])2
Then,
T Pr[CS Restart]
2 (1 Pr[CS Restart])2
where
Pr[CS Restart] = Pr[Cx in Cs] *
Pr[A Commit in the previous (M 1) Tq interval]
We can estimate that some other task commits in the previous interval of (M 1) Tq by modeling the
commits as a Poisson process with an arrival rate
(N 1)
(X+B+Z)
Then,
Pr[A commit in the previous (M 1) Tq interval]
1 Pr[No commit in the previous
(M 1) T, interval]
1 e(-x*(M"-1)*T)
TL
Pr[Cx in Cs] for M >1
S 0 Otherwise
The blocking time Z is due to the critical section being busy, and a context switch while in critical
section.
R,
Z = Pr[Cx in Cs] *(M 1) Tq + (1+ Pr[Cx in Cs]) B, (9)
(1 R,)
where R, is the utilization of the rest of the N 1 tasks that use the critical section, given by
(N 1)
R,. (N R
N
and B, is the residual life of the lock holding time B for which the task under consideration is blocked.
Assuming that the lock holding time is uniformly distributed with a mean B and variance rB = "Tc, B,
is given by
B a2
B, = + B
2 2*B
As only one process uses the critical section on a processor, the utilization per processor is given by
B
S(X + Z + B)
Then, the critical section utilization is
R = N*Rp
B
= N B(10)
(X + Z + B)
X can be computed using equation 7. We can compute B and R with equations 8, 9 and 10 using
iteration, setting the initial values of B to be Tc and of R to be zero. Knowing R and B, Z can be computed,
and hence the cycle time CI.
3.2.3 Validation of Analysis
We validated the analysis by simulation using SIMPACK [6], a discrete event simulation package. We set the
values of M = 4, T, = 1000, and Tq = 100. The work time T, is a random variable uniformly distributed
between 800 and 1200 with a mean of 1000. The critical section time is also an uniformly distributed random
variable with a range of 20% either way about the mean. In each experiment, we selected a different Tc
from 10 to 90 to represent a wide range of critical section utilizations. The results comparing the cycle
times obtained by simulation and the cycle times computed by analysis are given in Table 1. As can be seen,
except for a high Tc/Tq ratio, the results are reasonably accurate for the analysis to be meaningful. A similar
conclusion can be drawn from the lock utilization obtained from simulation and analysis, as presented in
Table 2.
3.2.4 Performance comparison using analysis
We evaluated the performance of IC'\ l-I; and LOCK-NR algorithms using the model developed in the
previous sub-sections.
The cycle times of the tasks using the critical section are shown in Figures 4 through Figure 7, with Tc/Tq
ranging from 10% to "I' The work time T, is 1000 units, and M is 4. For a critical section to quantum
time ratio of up to 50%, IC',\ -I; algorithm performs better than LOCK-NR algorithm. Even for a higher
ratio, IC'\I- I; algorithm performs better for small number of processors. We observe that there is a steep
transition in the cycle times when the lock utilization reaches 100% for both the algorithms. As expected,
the critical section utilization is less for IC"\I-I; as shown in Figures 8 through 11. Because the IC'\I-I;
decreases the lock utilization, it improves the scalability of the application which uses it.
We analyzed the effect of multiprogramming by varying M from 1 to 8, with Tc = Tq/4. Except for the
case of no multiprogramming, IC',\I-I; always performs better as shown in figures 12 through 14.
We conclude that for a low to moderate critical section execution time to quantum time ratio, it is
advantageous to use the IC'\I-I; algorithm. In general, critical section execution times are very short
and only a fraction of the quantum times. Thus, the IC'\I-I; algorithm that releases the lock during a
context-switch is an attractive alternative to other lock-based algorithms.
Table 1: Validating cycle time analysis using simulation for IC-\I-I;
Cycle Time
ICKN1-1; LOCK-NR
Tc Processors Simulation Analysis Abs.% Simulation Analysis Abs.%
Diff Diff
10 4 4044.30 4044.31 0.00 4044.50 4044.39 0.00
8 4040.04 4040.01 0.00 4040.64 4040.48 0.00
12 4049.11 4049.08 0.00 411511 :2 411511 12 0.00
16 I -,, Hi I1-, 1.95 0.00 4115i; 57 4115i; 44 0.00
20 411..- *-, 4038.11 0.01 4042.03 4041.72 0.02
24 4039.85 4038.93 0.02 4042.91 4041.62 0.03
28 4049.14 4048.36 0.02 415: 54 415:; *'3 0.01
32 ii,-' ..; 4051.64 0.02 41157.69 41157.75 0.00
36 ,11, ,' In 1 .40 0.03 Il .-' ..; 111.-' !1; 0.01
40 4046.91 4045.40 0.04 11,', i,7 1,., 0.01
44 4039.62 4037.89 0.04 41,51 i;2 4049.92 0.02
48 4049.14 4046.98 0.05 1111.-' ..1, 4061.72 0.02
52 4048.51 4046.26 0.06 4064.73 1'1... *I1 0.02
56 4044.58 4041.87 0.07 4064.06 In".-' .1 0.04
60 4046.44 4043.55 0.07 4068.55 4068.77 0.00
64 4049.33 4046.37 0.07 4076.04 1,-; 8 0.00
50 4 4216.49 4219.43 0.07 4220.63 4220.72 0.00
8 4230.69 4239.95 0.22 4245.23 4249.52 0.10
12 4254.30 4278.36 0.57 4298.42 4314.26 0.37
16 4279.15 4318.48 0.92 4370.40 4428.70 1.33
90 4 4408.98 4425.15 0.34 4413.87 4422.75 0.20
8 4501.65 4559.00 1.27 4518.83 4584.24 1.45
12 I.;, 8 17;-"'.99 1.16 -1 i-' 5043.99 6.14
16 5078.08 5077.26 0.02 7.'.69 7134.15 23.69
Table 2: Validating lock utilization analysis using simulation for IC\I -I;
Lock Utilization (' .)
ICK 1--1; LOCK-NR
Tc Processors Simulation Analysis Abs.% Simulation Analysis Abs.%
Diff Diff
10 4 1.00 0.99 1.00 4.00 3.90 2.50
8 2.00 2.02 1.00 7.70 7.90 2.59
12 3.00 3.06 0.99 12.00 11.90 0.83
16 4.10 4.10 0.00 15.60 15.80 1.28
20 5.10 5.18 1.57 19.70 19.81 0.56
24 6.20 6.20 0.00 23.00 23.76 3.30
28 7.20 7.30 1.39 27.30 27.60 1.10
32 8.20 8.30 1.22 30.70 31.50 2.60
36 9.20 9.40 2.17 34.10 35.40 3.81
40 10.30 10.50 1.94 38.30 39.47 3.05
44 11.30 11.52 1.95 43.00 43.34 0.79
48 12.30 12.60 2.44 45.40 47.30 4.18
52 13.30 13.60 2.26 49.80 51.20 2.81
56 14.40 14.71 2.15 52.50 55.18 5.10
60 15.40 15.76 2.34 56.50 59.00 4.42
64 16.40 16.81 2.50 59.80 62.90 5.18
50 4 5.00 5.01 0.20 18.70 18.93 1.23
8 10.60 10.87 2.55 37.50 37.68 0.48
12 16.40 17.57 7.13 55.70 55.70 0.00
16 22.50 24.62 8.79 72.50 72.35 0.21
90 4 9.20 9.09 1.20 32.50 32.52 0.06
8 20.80 21.70 4.33 63.50 62.88 0.98
12 35.70 38.80 8.68 90.50 85.73 5.27
16 54.40 57.70 6.07 99.30 100.00 0.70
4 Conclusions
In a multiprogrammed shared memory multiprocessor, tasks can be needlessly blocked if another task holds
a lock and then is switched out of its CPU. We present the IC'" 1-I; protocol, in which a task releases its
locks when it undergoes a context switch. The advantage of the IC'"\1-1; protocol over other solutions to
this problem are that IC \1-I ; is more general and is easier to implement. The IC \1-I ; protocol avoids the
unnecessary blocking, but at the cost of possible re-executions of the critical section.
We evaluate the performance of the IC' \I- I; protocol in comparison to the no-release algorithm. We first
present experimental results from an implementation, and find that IC'"\ -I; always has better performance.
Next, we develop analytical performance models of IC'"\I-I; and of the no-release case. We validate the
analytical models with a simulation. Using the analytical models, we compare the performance of IC"\I -I;
and the no-release case. We find that if critical sections are short compared to the time quanta, then using
the IC' \I- I; protocol gives faster response times and allows a more scalable application than taking no action
on a context switch.
References
[1] J. Alemany and E.W. Felten, Performance Issues in Non-Blocking Synchronization on -l.....I Mem-
ory Multiprocessors, Proceedings of the llth Annual AC\I Symposium on Principles of Distributed
Computing, Vancouver, BC, Canada, 1992, pp. 125-134.
[2] T. E. Anderson, E. D. Lazowska, and H. M. Levy, The Performance Implications of Thread Management
Alternatives for Shared-Memory Multiprocessors, IEEE Transactions on Computers, Vol. 38. No. 12,
1989, pp. 1631-1644.
[3] T. E. Anderson and H. M. Levy, Scheduler Activations: Effective Kernel Support for the User Level
Management of Parallelism, AC\I Transactions on Computer Systems, Vol. 9, No. 1, 1992, pp. 53-79.
[4] B. Bershad, Practical Considerations for Non-Blocking Concurrent Objects, IEEE 13th International
Conference on Distributed Computing Systems, Pittsburgh, PA, USA, 1993, pp. 264-273.
[5] P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Sys-
tems, Addison-Wesley Publishing Company, Reading, MA, USA, 1 1 .
[6] P. A. Fishwick, SIMPACK: Gelling Started with Simulation Programming in C and C++, Technical
Report Electronic TR92-022, University of Florida, 1992.
[7] K. Harathi, Synchronization Algorithms for Real Time Systems, Ph.D. Thesis, Dept. of CIS, University
of Florida, 1995.
[8] M. Herlihy, A Methodology for Implementing Highly Concurrent Data Objects, AC\I Transactions on
Programming Languages and Systems, Vol. 15, No. 5, 1993, pp. 745-770.
[9] T. Johnson and K. Harathi, Interruptable Critical Sections, Technical Report TI.;' I-iii;, Dept. of CIS,
University of Florida, 1994. Available at ftp.cis.ufl.edu:cis/tech-reports/tr94/tr94.007.ps.Z.
[10] L. Kleinrock, Queuing Systems, Volume 1: Theory, John Wiley & Sons, New York, NY, USA, 1 I'
[11] C. McCann, R. Vaswani, and J. Zahorjan, A Dynamic Processor Allocation Policy for Multiprogrammed
I/....-I-Memory Multiprocessors, AC\I Transactions on Computer Systems, Vol. 11, No. 2, 1993, pp.
146-178.
[12] Motorola Inc., psos+ Rteid-Compliant Real-Time Kernel User's Manual, Tempe, AZ, USA, 1990.
[13] Motorola Inc., Vmeexec User's Guide, Second Edition, Tempe, AZ, USA, 1990.
[14] R. Rajkumar, L. Sha and J. P. Lehoczky, Real-Time Synchronization Protocols for Multiprocessors,
IEEE Real-Time Systems Symposium, Huntsville, Alabama, 1988, pp. '_".'1-269.
[15] LihChyun Shu, Michal Young, and Ragunathan Rajkumar, An Abort Ceiling Protocol for Controling
Priority Inversion, Proceedings of the First International Workshop on Real-Time Computing Systems
and Applications, Seoul, Korea, 1994, pp. 202-206.
[16] H. Takada, and K. Sakamura, Predictable Spin Lock Algorithms with Preemption, Proceedings of the
llth IEEE Workshop on Real-Time Operating Systems and Software, Los Alamitos, CA, USA, 1994,
pp. 2-6.
[17] H. Takada, and K. Sakamura, Real-Time Synchronization Protocols with Abortable Critical Sections,
Proceedings of the First International Workshop on Real-Time Computing Systems and Applications,
Seoul, Korea, 1994, pp. 48-52.
4300
Tc=10 Tq=100 ICSM-R -
LOCK-NR+
4250
4200
I, I
a 4150 -
4100 .
.-'
4050 ----
4000
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 4: IC 's\-I; Cycle Times using analysis for Tc = 10
16000
ICSM-R
Tc=50Tq =100 LOCK-NR +
14000 .
12000
10000 /
8000 '
6000 "
4000
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 5: IC s\-II; Cycle Times using analysis for Tc = 50
55000
ICSM-R
50000 Tc = 75Tq = 100 LOCK-NR
45000
40000
35000
E
Fi 30000
5 25000
S20000
15000 +
10000 +-*
5000
0
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 6: IC 's\-I; Cycle Times using analysis for Tc = 75
400000
ICSM-R
350000 Tc=90Tq=100 NICSM-R
300000
250000
E
S200000
S150000
100000
50000
0-+ 4-~-- +-4----------------*-- +--t--
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 7: IC", \ -I; Cycle Times using analysis for Tc = 90
1
ICSM-R
0.9 Tc = 10Tq = 100 LOCK-NR "
0.8 +
o +.'
g 0.7 -
.N ..+/
D 0.6 "
2 0.5 .
c) 0.4
0.3 -
0.2
0.1 +,. f"
0
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 8: IC,\I -I; critical section utilization using analysis for Tc = 10
ICSM-R
0.9 LOCK-NR +
0.8 Tc = 50 Tq = 100
0.7
0.6
2. 0.5
Co 0.4
0.3
0.2 /
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 9: IC'\- I; critical section utilization using analysis for Tc = 50
0.9 LOCK-NR -+
0.8 'Tc = 75Tq = 100
0
0.7 /
D 0.6
0.5
U, 0.4 -
S 0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 10: IC,\I -I; critical section utilization using analysis for Tc = 75
ICSM-R
0.9 / LOCK-NR -+
0.8 Tc= 90Tq = 100
0.7
D 0.6
S 0.5
c, 0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 11: IC'" \-I; critical section utilization using analysis for Tc = 90
2400
ICSM-R -*
2200 M = 1 LOCK-NR
2000
E 1800
F-
S 1600
1400
1200
1000 ..
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 12: IC' \I-I; effect of multiprogramming on cycle time for Tc = 25 (M = 1)
10000
9000
8000
7000
6000
5000
4000
ICSM-R -
M=4 LOCK-NR +
0.+
.
-s
-t
+-
'.,--I'..
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 13: IC' \I-I; effect of multiprogramming on cycle time for Tc = 25 (M = 4)
20000
18000
16000
14000
12000
10000
8000
ICSM-R --
M=8 LOCK-NR +
.+'
.
-s
-t
,+-
+
/ +'4
3 40
.4_. +"
0 10 20 30 40 50 60 70 80 90 100
Processors
Figure 14: IC'\I -I; effect of multiprogramming on cycle time for Tc = 25 (M = 8)
|