Citation
Using wait-free synchronization to increase system reliability and performance

Material Information

Title:
Using wait-free synchronization to increase system reliability and performance
Creator:
Berrios, Joseph Stephen
Place of Publication:
[Gainesville, Fla.]
Publisher:
University of Florida
Publication Date:
Language:
English

Subjects

Subjects / Keywords:
Algorithms ( jstor )
Architectural education ( jstor )
Buffer storage ( jstor )
Buyer beware ( jstor )
Computer programming ( jstor )
Databases ( jstor )
Libraries ( jstor )
Operating systems ( jstor )
Semaphores ( jstor )
Software ( jstor )
Computer and Information Science and Engineering thesis, Ph. D ( lcsh )
Dissertations, Academic -- Computer and Information Science and Engineering -- UF ( lcsh )
Parallel programming (Computer science) ( lcsh )
concurrency -- database -- distributed -- free -- operating -- software -- synchronization -- systems -- wait
Genre:
government publication (state, provincial, terriorial, dependent) ( marcgt )
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Summary:
ABSTRACT: Wait-free synchronization has been recognized in the research literature as an effective programming technique in the development of concurrent programs. The concurrent programming community, however, has been slow to adopt this technique. Our research addresses the practical application of wait-free synchronization in the design of operating systems, distributed systems, and network applications. We demonstrate its use in the scheduler of the Linux operating system and in the design of client-server applications. The resultant programming code from using wait-free synchronization is more easily seen to be fault tolerant, yet suffers no performance penalty. The performance analysis shows that under appropriate conditions wait-free synchronization techniques outperform traditional locks. This practical demonstration of the benefits of wait-free synchronization should help foster its adoption in the development of computer software in which concurrent programming is relevant.
Thesis:
Thesis (Ph. D.)--University of Florida, 2002.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
General Note:
Title from title page of source document.
General Note:
Includes vita.
Statement of Responsibility:
by Joseph Stephen Berrios.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Berrios, Joseph Stephen. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
12/1/2004
Resource Identifier:
029833610 ( ALEPH )
78392043 ( OCLC )

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

USING WAIT-FREE SYNCHRONIZATION TO INCREASE SYSTEM RELIABILITY AND PERFORMANCE By JOSEPH STEPHEN BERROS A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2002

PAGE 2

Copyright 2002 by Joseph Stephen Berros

PAGE 3

I dedicate my work to past, present, and future researchers, and to the brave men and women who are serving in our armed forces fighting for freedom and liberty during this time of national crisis.

PAGE 4

ACKNOWLEDGMENTS My deep appreciation goes to Dr. Manuel E. Bermdez, chairman of my committee, for his guidance and advice. I would like to express my deepest gratitude and appreciation to Dr. Joseph N. Wilson whose guidance and mentoring made this dissertation possible. I would like to express my gratitude to Dr. T Chris Carnes for his guidance in the field of databases. My deepest gratitude go to Dr. Douglas D. Dankel for his careful review and corrections to my dissertation. Special thanks go to Dr. A. Antonio Arroyo for being a member or my supervisory committee and for his guidance and suggestions. I would like to express my gratitude to many individuals that throughout my academic and professional career provided me with guidance and mentoring, especially Dr. Bjarne Stroustrup, Dr. Arnaldo Castro-Villa, Prof. Ramonita Villar, Dr. Peter M. Maurer, Dr. Rafael Perez, Dr. Warren Viessman, Jr., and Melissa Hilleary for their support throughout my academic life. Also, my deepest gratitude to RADM Stephen T. Keith, CAPT Karl Yeakel, CAPT Max Norgart, CAPT Luis Posada, CDR Jim Lowder II, CDR Henry Johnson, CDR Ed Harter, CDR Craig Rouhier, CDR Samuel Coons, CDR Todd Chase, CDR Jim McGrath, CDR Alessandro Cuevas, CDR Fred Douglas, and LCDR Kurth Schaedel for their support and guidance throughout my military career. Also, I would like to acknowledge the support of the administrative staff in the CISE department at the University of Florida, especially John Bowers, Ardiniece Caudle, Tami D. Blue, Linda Smith, and Debbie Buttler. iv

PAGE 5

Special thanks go to Wes “The Rock” Harmon, Carol D. Zaborsky, Brenda Snchez, and John J. Bowers for helping keep my sanity and perspective while pursuing my doctoral studies. My gratitude goes to Bob Magnat for allowing me to test my ideas in his iMac. Special thanks go to my family, friends, and colleagues for their encouragement and unfailing support throughout my studies. My doctoral studies have been partially funded by the National Science Foundation (NSF) Minority Engineering Doctorate Initiative (MEDI) Fellowship. v

PAGE 6

TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................................................................................. iv LIST OF TABLES........................................................................................................... viii LIST OF FIGURES ........................................................................................................... ix ABSTRACT....................................................................................................................... xi CHAPTER 1 INTRODUCTION ...........................................................................................................1 2 WAIT-FREE SYNCHRONIZATION.............................................................................9 3 LOW-LATENCY PATCH ............................................................................................11 3.1 Adding Wait-Free Synchronization to the Patch ................................................... 11 3.2 Studies and Results ................................................................................................ 19 4 SOCKETS......................................................................................................................23 4.1 UNIX C Socket Library......................................................................................... 24 4.2 C++ Socket Library ............................................................................................... 24 5 DISTRIBUTED SYNCHRONIZATION......................................................................36 5.1 Distributed Hash Table .......................................................................................... 36 5.2 The Hybrid Readers and Writers Implementation................................................. 38 5.3 Client and Server Implementation Issues .............................................................. 42 5.4 Studies and Results ................................................................................................ 44 5.5 Databases ............................................................................................................... 54 6 CONCLUSION..............................................................................................................56 vi

PAGE 7

APPENDIX A LINUX LOW LATENCY PATCH ..............................................................................59 B SOCKET LIBRARY.....................................................................................................62 C CLIENT-SERVER SIMULATION..............................................................................83 LIST OF REFERENCES.................................................................................................113 BIOGRAPHICAL SKETCH ...........................................................................................118 vii

PAGE 8

LIST OF TABLES Table page 3-1 System specifications ................................................................................................19 5-1 Linux System specifications......................................................................................45 5-2 Mac OS X System specifications ..............................................................................46 viii

PAGE 9

LIST OF FIGURES Figure page 2-1 The compare-and-swap instruction ...........................................................................9 2-2 Sample function using wait-free synchronization.....................................................10 3-1 Original shmem_recalc_inode function ...........................................................12 3-2 Modified shmem_recalc_inode function..........................................................12 3-3 Programming Logic code for shmem_recalc_inode ........................................13 3-4 Triple that tests the shmem_recalc_inode logic...............................................13 3-5 Programming logic for modified shmem_recalc_inode...................................13 3-6 Logic test for successful operation............................................................................13 3-7 Logic test for unsuccessful operation........................................................................14 3-8 Original set_running_and_schedule function.............................................14 3-9 Modified set_running_and_schedule function ...........................................15 3-10 Function that wait-free synchronization cannot be implemented............................17 3-11 Two threads of execution results.............................................................................20 3-12 Four threads of execution results.............................................................................21 3-13 Eight threads of execution results............................................................................21 3-14 Sixteen threads of execution results ........................................................................22 4-1 Socket program using the traditional C library .........................................................25 4-2 UML diagram from exception classes.......................................................................27 4-3 UML diagram for protocol classes............................................................................28 4-4 UML diagram for socket classes ...............................................................................29 ix

PAGE 10

4-5 Socket program using the C++ socket library...........................................................35 5-1 Link Class..................................................................................................................39 5-2 RWLock Class...........................................................................................................39 5-3 SortedList Class.........................................................................................................39 5-4 HashTable Class ........................................................................................................40 5-5 Wait-free increment function.....................................................................................40 5-6 Shared list ..................................................................................................................41 5-7 Sample code...............................................................................................................42 5-8 Hashed table ..............................................................................................................43 5-9 First implementation of readers and writers using wait-free synchronization ..........44 5-10 Modified implementation of readers and writers using wait-free synchronization.45 5-11 The compare-and-swap instruction for the Intel architecture .................................46 5-12 The compare-and-swap instruction for the Power PC architecture.........................47 5-13 Two threads of execution results in Linux ..............................................................48 5-14 Two threads of execution results in Mac OS X.......................................................49 5-15 Four threads of execution results in Linux..............................................................49 5-16 Four threads of execution results in Mac OS X ......................................................50 5-17 Eight threads of execution results in Linux.............................................................50 5-18 Eight threads of execution results in Mac OS X .....................................................51 5-19 Sixteen threads of execution results in Linux..........................................................51 5-20 Sixteen threads of execution results in Mac OS X..................................................52 5-21 Writers results..........................................................................................................53 5-22 Readers results.........................................................................................................53 5-23 Readers and writers average results ........................................................................54 6-1 Sample code...............................................................................................................58 x

PAGE 11

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy USING WAIT-FREE SYNCHRONIZATION TO INCREASE SYSTEM RELIABILITY AND PERFORMANCE By Joseph Stephen Berros December 2002 Chair: Manuel E. Bermdez Cochair: Joseph N. Wilson Department: Computer and Information Science and Engineering Wait-free synchronization has been recognized in the research literature as an effective programming technique in the development of concurrent programs. The concurrent programming community, however, has been slow to adopt this technique. Our research addresses the practical application of wait-free synchronization in the design of operating systems, distributed systems, and network applications. We demonstrate its use in the scheduler of the Linux operating system and in the design of client-server applications. The resultant programming code from using wait-free synchronization is more easily seen to be fault tolerant, yet suffers no performance penalty. The performance analysis shows that under appropriate conditions wait-free synchronization techniques outperform traditional locks. This practical demonstration of the benefits of wait-free synchronization should help foster its adoption in the development of computer software in which concurrent programming is relevant. xi

PAGE 12

CHAPTER 1 INTRODUCTION Synchronization is one of the key elements of the design of concurrent systems. Since the introduction of the critical section problem [DIJ65], researchers have posed many problems and possible solutions. In the research presented in this dissertation we provide practical implementation and use of wait-free synchronization in the realm of the design of operating systems and in distributed applications. We demonstrate the feasibility of using wait-free synchronization by modifying the Linux kernel. Also, we implement a client-server application that modifies a shared resource using wait-free synchronization. The basic premise of a concurrent system, such as a distributed system or an operating system, is that it is a collection of independent processes working independently to harness a higher computational power. This is done so users can have a higher computational power at the same cost or a lower one. These design goals have not been completely realized [EDL95]. This is due to technical difficulties inherent in the design of concurrent systems. The synchronization of these systems has been one of the greatest challenges in this area. Most process communication in single processors and in shared-memory multiprocessors is done using shared memory rather than message passing. However, such operating systems provide message-passing primitives so that processes can communicate with other machines on a network. Operating systems such as Apple’s Mac 1

PAGE 13

2 OS X, UNIX, Linux, and MS Windows XP provide sockets and a variety of system calls for sending and receiving messages [PAT71, STE98]. For a distributed operating system it is necessary to provide and use message-passing primitives extensively. Many programming languages, such as Linda [CAR86] and Java [HAR00], include message-passing facilities. These languages include mechanisms for asynchronous and synchronous message passing. The relationship between shared memory and message-passing systems was noticed and analyzed by Lauer and Needham [LAU79]. Stroustrup [STR82] researched the two styles and found that performance is about the same for client-server applications on shared-memory architectures, even though message passing is occasionally slower. The problem of synchronization and critical sections has been studied extensively in the research literature [DIJ65, DIJ68A, DIJ68B, DIJ71, DIJ79, HAN77, LAM87]. The study of this problem began with the classical critical section problem described by Edsger Disjkstra [DIJ65]. The synchronization between two or more processes can be achieved as follows: Busy-waiting or spinning: A given executing process consumes processor cycles when it must wait [AND00, DIJ65] as defined by Dimitrovsky [DIM91]. Context-switching or blocking: The waiting process relinquishes its processor while waiting [AND00, DIJ68A] as defined by Dimitrovsky [DIM91]. Hybrid or two-phase blocking: The scheduler selects between busy-waiting and context-switching dynamically, as described by Ousterhout [OUS82]. Busy-waiting is achieved with no special hardware by using shared variables and atomic operations, such as load and store, on a shared memory system [DIJ65, LAM87]. Much work has been done on efficient busy-waiting synchronization for machines with

PAGE 14

3 hardware cache coherence or locally accessible shared memory. Cache coherence is a protocol for managing the caches of a multiprocessor system so that no data are lost or overwritten before the data are transferred from a cache to main memory [JEN87]. The goal is to reduce the amount of serialization caused by contention for shared memory resources. Rudolph and Segall [RUD84] proposed the use of a test and test and set loop for locking on systems with hardware support for cache coherence. The principle of this technique is to have processors perform spinning, a technique that uses iteration, by only referencing their local cache. Even though the use of local cache improves throughput, this technique does not avoid the necessity of referencing shared memory when the lock is finally released. This operation can require linear time in the number of waiting processors. Another popular technique is back-off which has been applied to reduce memory contention while using locks [DIJ68A, GRA90]. In this technique explicit delays are dynamically calculated to respond to the level of contention experienced. Compared to a traditional busy-waiting scheme, back-off based synchronization typically sacrifices fairness since most of these schemes poll newly arriving processors at a higher frequency than processors that have already been waiting for a long time. Goodman, Vernon, and Woest [GOO89] proposed a primitive that builds a queue of waiting processors using the memory of the cache lines located at each processor. This implementation is a variant of test and set that fails when the issuing processor is not at the head of the queue. As a result, spinning is almost completely local.

PAGE 15

4 Anderson [AND90] gave a software-only lock algorithm with similar behavior to computer architectures that provides a hardware cache coherency mechanism. Each waiting processor selects a unique spinning location within an array by using “fetch and increment.” Each spinning location is marked to indicate “wait” or “proceed.” All entries, with the exception of the initial entry, are initially marked to wait. To release a lock, the owner of the lock flags its array entry to wait and allows the next array entry to proceed. This technique almost eliminates spin-waiting memory traffic. Arranging the array elements to fall in different cache lines accomplishes this. Graunge and Thakkar [GRA90] studied the effect of a variety of locking algorithms on the sequent symmetry architecture and described an algorithm similar to Anderson’s, in which a separate array element is permanently associated with each participating processor. The atomic instruction fetch and add is used to determine the location associated with the previous lock holder. Releasing the lock is done by changing the state of the executing task permanent location. Mellor-Crummey and Scott [MEL91A, MEL91B] present a busy-waiting algorithm that uses a linked list to represent the spinning locations, so each processor spins on a location of its own choosing. This enables the spinning to take place locally even when the machine has no cache coherency mechanism in hardware. A subtle problem with all these local-spin synchronization techniques is that they impose some queuing order on processors as they begin to synchronize. While this has the advantage of ensuring fairness, it also makes each waiting processor suffer from every delay experienced by each predecessor in the queue. Such delays might include shared memory contention, interrupt, exception, page fault handling, and preemptive

PAGE 16

5 scheduling. A lock algorithm proposed by Wisniewski, et al. [WIS94] accommodates preemptive scheduling by skipping over a preempted waiter. Doing so, however, sacrifices fairness. An alternate method of achieving busy-waiting is to stall the processor until the synchronization condition is satisfied. This avoids congesting the processor-memory interconnection bus or network with polling traffic. Examples of this type of synchronization can be found at the full/empty bits on memory words and on the Denelcor HEP [SMI81] and Harrison’s Add-and-Lambda proposal [HAR88]. Context switching is achieved using special constructs provided by the operating system or the programming language used. The most popular constructs are semaphores and monitors. Edsger Dijkstra invented semaphores [DIJ68A]. He devised them to have a useful tool for implementing mutual exclusion and for signaling the occurrence of events such as interrupts. A semaphore is a special kind of shared variable that is manipulated only by two atomic operations, P and V. The value of a semaphore is nonnegative. The V operation is used to signal the occurrence of an event, so it increments the value of the semaphore. The P operation is used to delay a process until an event has occurred, so it waits until the value of a semaphore is positive and then decrements the value. The power of this technique comes from the fact that P operations might have to delay and the waiting process can relinquish the processor to other active processes. Dijkstra also successfully used semaphores in the “THE” operating system, one of the first multiprogrammed operating systems. He also presented a seminal paper on cooperating sequential processes [DIJ68B]. His paper showed how to use semaphores to solve a

PAGE 17

6 variety of synchronization problems. The most important ones are the dining philosophers and the sleeping barber problem. Hoare [HOA74] introduced the concept of split binary semaphores even though Dijkstra was the one who later named this technique and illustrated its use. Of particular importance to this research is the fact that Dijkstra [DIJ79] presented a solution to the readers/writers problem using split binary semaphores. The term split binary semaphore comes from the fact that a given semaphore among n semaphores can be viewed as a single binary semaphore. The power behind this technique is that in general a split binary semaphore can be formed from any number of binary semaphores. Also, Dijkstra [DIJ80] showed how to implement general semaphores using only split binary semaphores. Following Dijkstra’s papers on split semaphores, Andrews [AND89] developed the technique of passing the baton. Passing the baton is an optimization of Dijkstra’s algorithms [DIJ79, DIJ80] that is modeled after the baton used by athletes in a track and field event. Once a process finishes executing its critical section, it passes the baton to the next process in the waiting queue to access the critical section. Many researchers have proposed variations on semaphores. Patil [PAT71] proposed a Pmultiple instruction, which waits until a set of semaphores are all non-negative and then decrements them. Reed and Kanodia [REE79] use instructions called event counts and sequencers, which can be used to construct semaphores but can also be used directly to solve additional synchronization problems. Faulk and Parnas [FAU88] have examined these kinds of synchronization that arise in hard-real-time systems, which have critical timing issues. Their argument is that in real-time systems, the P operation on semaphores

PAGE 18

7 should be replaced with two primitive operations: pass, which waits until the semaphore is nonnegative; and down, which decrements it. The desire for abstraction in all these techniques culminated with the development of monitors [HOA74]. The development of monitors was inspired by the data encapsulation originated with the class construct in Simula-67 [DAH70]. Monitors are program modules that provide more structure than semaphores and yet it is in theory as efficient as semaphores. A monitor is a data abstraction mechanism that encapsulates the representation of an abstract data type and provides a set of operations that are the only means of operating on an object of monitor type. Edsger Dijkstra [DIJ71] is generally credited with being the first to advocate using data encapsulation to control the access to shared variables in a concurrent program. The concept was called secretary, but he provided no syntactic mechanism for programming secretaries. Per Brinch Hansen [HAN72] advocated the same idea, which he later embodied in a specific language proposal called a shared class [HAN73]. Monitors were created and popularized by Hoare [HOA74]. Also, in this influential paper Hoare presented numerous interesting examples, including a bounded buffer, interval timer, and disk head scheduler. Concurrent Pascal [HAN75] was the first concurrent programming language that included monitors. The three main components were processes, monitors, and classes. Per Brinch Hansen [HAN77] documented the fact that Concurrent Pascal was used to write several operating systems. Several additional languages provide the monitor construct. Among them are Modula [WIR77] (not to be confused with Modula-2 [WIR85] and Modula-3 [NEL91]), Mesa [LAM80], Pascal Plus [WEL79], Emerald [RAJ91], and most notably, Java [ARN00, LEA00] and Ada 95 [ISO95].

PAGE 19

8 Concurrent programs are modeled with two primary design methodologies, which are as follows: Shared memory (tightly coupled): these programs use shared variables and are usually used within uniprocessors and multi-processors that share memory. Distributed memory (loosely coupled, message passing system): Programs that are designed to execute on multi computers and networks of computers. The distributed memory model originated with the introduction of the message-passing concept in the late 1960s [HOA78]. Even though general-purpose multiprocessors and computer networks did not exist at the time, some operating system designers realized that it would be attractive to design this type of system as a collection of processes. The main strengths that researchers envisioned with these systems are that every process has a specific function, and one process cannot interfere with another since they do not share variables. The first message passing mechanism was designed and implemented by Per Brinch Hansen [HAN70]. Brinch Hansen’s original nucleus provided four primitives that supported client/server communication using a shared pool of fixed-length buffers. Later he added two more primitives that allowed a process to examine its message queue and answer buffers and to receive specific messages. This allows a process to engage in more than one conversation at a time. Bic and Shaw [BIC88] provide a detailed overview of Per Brinch Hansen’s original primitives. The research presented in this dissertation is built on the foundation laid forth by the research literature. The goal of these researchers was to increase throughput and reduce the latency of concurrent applications. We present the practical implementation of wait-free synchronization in the design of software systems.

PAGE 20

CHAPTER 2 WAIT-FREE SYNCHRONIZATION On some computer architectures synchronization can be achieved using low-level instructions such as compare-and-swap. This supports a type of synchronization called wait-free synchronization [HER91]. Using traditional lock synchronization, if a process in a critical section is rescheduled or if it encounters a page fault while accessing its stack, the lock will prevent other processes from accessing the critical section. If a process dies while holding a lock, no other processes can access the critical section until a forced release of the lock. This is often accomplished by nothing less than restarting the system. This situation can be avoided using wait-free synchronization. Wait-free synchronization is a technique that relies on architectures that provide the compare-and-swap instruction. By using on this instruction properly, mutual exclusion can be guaranteed without holding any locks. By doing so, no executing process must wait to access a critical section; hence the name wait-free synchronization. In general the compare-and-swap instruction is implemented in computer processors as follows: atomic int CAS(int& source, int copy, int target) { int temp; if (copy == source) { source = target; return copy; } return target; } Figure 2-1: The compare-and-swap instruction 9

PAGE 21

10 The compare-and-swap instruction takes three parameters. These parameters are the original source, which is passed by reference, a copy of the value, and the desired new value. The instruction compares the source value to the copy. If the values are identical, source gets the new target value and it returns the original value in copy. If they are not equal, the compare-and-swap instruction returns the target value. A simple example of this technique is the code that increments a counter: void inc() { int old, last; do { old = counter; last = CAS(counter, old, old+1); } while (last != old); } Figure 2-2: Sample function using wait-free synchronization The code enters a do-while loop. It assigns the value of counter to old. Then it assigns to last the result of the compare-and-swap instruction. If the value returned from the compare-and-swap instruction is equal to old, then the operation was successful. If not the process repeats the loop and tries to increment the counter another time. As is evident from this code, no process is rescheduled while accessing a shared variable. This will not prevent another process from performing operations on the shared variable. Even with the presence of a process failure, the system will continue to operate without hindrance. One of the major practical limitations of this technique is that it has gained little acceptance in the software development community, even though it is still an active area of research [CHA99].

PAGE 22

CHAPTER 3 LOW-LATENCY PATCH Our research has focused on finding practical applications of this technique in the development of software systems. One such system is the Linux operating system [BOV01]. Linux, by being an open source software product, allows us to investigate ways to incorporate wait-free synchronization within a mature system. The desire to increase throughput and reduce latency in the Linux operating system is an active area of software development. One such project is a low-latency patch developed by Andrew Morton [MOR01]. This patch optimizes task scheduling and in general reduces the latency for a given task to complete its execution. This patch relies heavily on locks. 3.1 Adding Wait-Free Synchronization to the Patch We noted that wait-free synchronization could be effectively used in certain parts of the code for the scheduler in the low-latency patch in the Linux Kernel. Two sections of shmem.c were easily modified using wait-free synchronization in the functions shmem_recalc_inode and shmem_delete_inode. The original version of shmem_recalc_inode is shown in Figure 3-1. The wait-free version of the code in Figure 3-1 is in Figure 3-2. To demonstrate that the 2 synchronization mechanisms presented are identical we are going to use a programming logic notation described by Andrews [AND91]. Programming logic is a formal logical notation that facilitates making precise statements 11

PAGE 23

12 static void shmem_recalc_inode(struct inode * inode) { unsigned long freed; freed = (inode->i_blocks/BLOCKS_PER_PAGE) (inode->i_mapping->nrpages + SHMEM_I(inode)->swapped); if (freed){ struct shmem_sb_info * sbinfo = SHMEM_SB(inode->i_sb); inode->i_blocks -= freed*BLOCKS_PER_PAGE; spin_lock (&sbinfo->stat_lock); sbinfo->free_blocks += freed; spin_unlock (&sbinfo->stat_lock); } Figure 3-1: Original shmem_recalc_inode function static void shmem_recalc_inode(struct inode * inode) { unsigned long temp; unsigned long freed; freed = (inode->i_blocks/BLOCKS_PER_PAGE) (inode->i_mapping->nrpages + SHMEM_I(inode)->swapped); if (freed){ struct shmem_sb_info * sbinfo = SHMEM_SB(inode->i_sb); inode->i_blocks -= freed*BLOCKS_PER_PAGE; do { temp = sbinfo->free_blocks; } while(temp != CAS(sbinfo->free_blocks, temp, temp + freed); } Figure 3-2: Modified shmem_recalc_inode function about the execution of a program. The formula used in programming logic are triples in the form of: { P } S { Q }. The P is the pre-condition, S is the statement, and Q is the post-condition. An example of a triple is: { x = 0 } x = x + 1 { x = 1 }. For atomic operations the < > brackets are used.

PAGE 24

13 The programming logic version of the shmem_recalc_inode using spin-lock would be: lock) -> sbinfo->free_blocks += freed> Figure 3-3: Programming Logic code for shmem_recalc_inode The triple that demonstrates the behavior of the above programming logic statement is as follows: { sbinfo->free_blocks = 5 ^ freed = 10 } lock) -> sbinfo->free_blocks += freed> { sbinfo->free_blocks = 15 } Figure 3-4: Triple that tests the shmem_recalc_inode logic The programming logic version of the shmem_recalc_inode using wait-free synchronization is: do true -> temp = sbinfo->free_blocks test = free_blocks, temp, temp + freed)> temp == test -> break od Figure 3-5: Programming logic for modified shmem_recalc_inode The following logical statements demonstrate the behavior when there is success in modifying the critical section: do true -> { sbinfo->free_blocks = 5 ^ freed = 10 } temp = sbinfo->free_blocks { temp = 5 ^ sbinfo->free_blocks = 5 ^ freed = 10 } test = free_blocks, temp, temp + freed)> { test = 5 ^ temp = 5 ^ sbinfo->free_blocks = 15 ^ freed = 10 } temp == test -> break od Figure 3-6: Logic test for successful operation

PAGE 25

14 The following logical statements demonstrate the behavior when the modification of the critical section is not successful: do true -> { sbinfo->free_blocks = 5 ^ freed = 10 } temp = sbinfo->free_blocks { temp = 5 ^ sbinfo->free_blocks = 10 ^ freed = 10 } test = free_blocks, temp, temp + freed)> { test = 5 ^ temp = 20 ^ sbinfo->free_blocks = 10 ^ freed = 10 } temp == test -> break /* operation unsuccessful, repeat loop */ od Figure 3-7: Logic test for unsuccessful operation If an iteration is not successful in modifying the critical section, the loop is repeated until there is success in modifying the critical section. As can be seen the spin-lock and the wait-free version are equivalent. Besides the obvious incrementing and decrement of integer variables, one of the functions that called our attention is set_running_and_schedule. This function takes a given thread to execute and then places it in the scheduler. void set_running_and_schedule(struct lolat_stats_t *stats) { spin_lock(&lolat_stats_lock); if (stats->visited == 0) { stats->visited = 1; stats->next = lolat_stats_head; lolat_stats_head = stats; } stats->count++; spin_unlock(&lolat_stats_lock); if (current->state != TASK_RUNNING) set_current_state(TASK_RUNNING); schedule(); } Figure 3-8: Original set_running_and_schedule function

PAGE 26

15 Clearly, this code relies on spin locks. It has been demonstrated in the research literature [And00] that spin locks are more appropriate for parallel processing than the use of traditional locking systems such as semaphores or monitors. The wait-free synchronization counterpart for this code is shown Figure 3-9. /* precondition: stats->init is initialized to 0 externally */ void set_running_and_schedule(struct lolat_stats_t *stats) { /* hybrid wait-free synchronization */ int temp; /* try to grab stats->visited */ if (cas(&stats->visited, 0, 1)) { /* initialization code */ stats->next = lolat_stats_head; lolat_stats_head = stats; stats->init = 1; } else { /* if stats->init is set to 0, enter while loop */ while(stats->init == 0) ; } /* operation to increment stats->count */ do { temp = stats->count; } while(temp != CAS(stats->count, temp, temp+1)); /* set task to executable and insert it in the scheduler */ if (current->state != TASK_RUNNING) set_current_state(TASK_RUNNING); schedule(); } Figure 3-9: Modified set_running_and_schedule function The lock version of this code looks somewhat simpler than the wait-free version, but the wait free version is not very complex. The variable stats->init is initialized to

PAGE 27

16 zero externally. The code in the beginning compares the value of stats->visited to zero. If the value of stats->visited is zero, it is assigned value one and the body of the if statement is executed. Note that the body of the if statement is no longer in a locked statement. This is achieved by implementing two separate if statements. The first if statement will only allow one process to modify stats->visited to one. The others will not be allowed to access this part of the code. The variable stats->init plays a major role in this section of code. After a process enters into the first if statement, all other processes will have to wait until stats->init has the value of one. The value of stats->init is changed to one when the process that is executing the code within the if statement completes the initialization code within the body of this block. Because the execution of this code is run only once, a compromise was made to include a condition that locks other processes and only allows the execution of one process. This code at a glance seems a rework of a spin-lock from the original code. In reality both codes are different in the sense that the original code relies on lolat_stats_lock which is a lock for all lolat_stats objects. In the wait-free version of this code each lolat_stats_t object has a visited data type which in effect each object has a unique lock instead of a global lock. Lindsley et al. [LIN02] discusses the use and misuse of the big Kernel lock (BKL) in the Linux kernel. It is their contention that in many instances of the use of a global lock can be replaced by a simple spin lock. This code is a clear example of their contention. The next task is to increment stats->count. To do so, the current value of stats->count must be copied. This must be done for the compare-and-swap

PAGE 28

17 instruction to work. The next step of the algorithm is to execute the compare-and-swap instruction passing the value of stats->count, the value of temp and the value of temp+1. When this instruction is executed, the operation becomes indivisible and the processor compares the values of stats->count and temp. If the values are identical, the compare-and-set instruction sets stats->count, calculates temp+1, and returns the value of temp, which in turn exits the do-while loop. If the values are not identical, the compare-and-swap instruction returns the value of temp+1 and the body of the do-while is repeated. Once this is done, the next step is to verify that the value of current->state is of an executable task. If it is not, the value of current->state is changed to reflect that the process is an executing task. In the end, the process that has been set to run is placed in the scheduler. Eight files that are used by the patch could not be modified to use wait-free synchronization. The synchronization in those files involved more than simply updating a single variable’s value. For example, in the file filemap.c there is a function called add_page_to_inode_queue. The code for this function is as follows: static inline void add_page_to_inode_queue(struct address_space *mapping, struct page * page) { struct list_head *head = &mapping->clean_pages; spin_lock(&mapping->page_lock); mapping->nrpages++; list_add(&page->list, head); page->mapping = mapping; spin_unlock(&mapping->page_lock); } Figure 3-10: Function that wait-free synchronization cannot be implemented

PAGE 29

18 Wait-free synchronization will not work in the previous example because within the spin_lock multiple objects are being modified. However, this does not prevent the use of this technique in other sections of the code. Wherever feasible, we replaced traditional locking with wait-free synchronization. The source code for the modified low-latency patch is located in Appendix A. The properties that characterizes the cases that wait-free synchronization is a viable alternative are: A single variables need to be modified. No other variable needs to be modified atomically after wait-free synchronization is performed on a given variable. On the other hands the properties that characterizes the cases that wait-free synchronization can not be used are: Multiple variables need to be modified atomically. Complex data types, including structs and classes, that are not handled by the compare-and-swap instruction will not work. Herlily [HER91] discusses an implementation of wait-free synchronization for complex objects but it is not a straightforward solution. After careful analysis of the occurrences of spin locks in the Linux source code in release 2.4.19 we found that: 5% of the spin locks can be replaced with wait-free synchronization (64 instances out of 1,284). 8% of the spin locks might be replaced with wait-free synchronization (103 instances out of 1,284). 87% of the spin locks cannot be replaced with wait-free synchronization (1,117 instances out of 1,284).

PAGE 30

19 3.2 Studies and Results These changes were implemented in the Linux operating system. For our experiments we used the Red Hat Linux 7.2 distribution. The computer we used for these experiments has the specifications outlined on Table 3-1. Table 3-1. System specifications Processor Intel Pentium III 1 Ghz Memory 1 Gigabyte of Ram Motherboard ABIT VP-6 SCSI Controller Adaptec SCSI Card 39160 Hard disk Seagate ST12550W (2Gbyte SCSI hard disk) To test the performance of the scheduler, a benchmark was written in C++. This benchmark was used to create threads, using the POSIX library. Each thread executes a function that creates a list of unsorted elements and sorts them using the insertion sort algorithm. We selected insertion sort because it is computationally intensive and its time complexity is O(N2). By the time complexity always being n2, the simulation results would be more reliable. We compiled and implemented a kernel using the low-latency patch. The other kernel was implemented using wait-free synchronization in the aforementioned parts of the code of the patch. These made the design hybrid, because other parts of the code still rely on locks. For our experimentation we performed low contention (2 threads), moderately low contention (4 threads), moderately high contention (8 threads), and high contention (16 threads). For each set, we executed 20 iterations and the results of the time taken to perform the calculations were averaged. The results are shown in Figure 3-11, 3-12, 3-13, and 3-14. As shown in Figures 3-11, 3-12, 3-13, and 3-14, the performance of both systems was very similar. There was no performance gain or penalty for using wait-free

PAGE 31

20 synchronization. The real gain is that the modified patch using the hybrid design increases the reliability of the code. Fault-tolerance is achieved because no process holds a lock. Even though these are small sections of code in which a crash is unlikely, this reliability gain should not be ignored. This is especially true since there are no performance penalties associated with the use of wait-free synchronization. 2 Threads05010015020025030035040045050010002000300040005000600070008000900010000IterationsTime (ms) Locks clock Hybrid clock Figure 3-11. Two threads of execution results

PAGE 32

21 4 Threads010020030040050060070080090010002000300040005000600070008000900010000IterationsTime (ms) Locks clock Hybrid clock Figure 3-12. Four threads of execution results 8 Threads0200400600800100012001400160010002000300040005000600070008000900010000IterationsTime (ms) Locks clock Hybrid clock Figure 3-13. Eight threads of execution results

PAGE 33

22 16 Threads05001000150020002500300010002000300040005000600070008000900010000IterationsTime (ms) Locks clock Hybrid clock Figure 3-14. Sixteen threads of execution results

PAGE 34

CHAPTER 4 SOCKETS The cornerstone of network and distributed programming is message passing. Message passing is the mechanism that is used for tasks to communicate among each other. The most popular mechanism for message passing is the Berkeley Sockets interface, universally known as sockets [STE98]. Sockets are the de facto standard application programming interface (API) for networking, spanning a wide range of systems, such as Windows XP, Mac OS X, Linux, Palm OS, and the Java Virtual Machine (JVM). Our original intention was to implement the distributed simulation, using wait-free synchronization in Java. Unfortunately the JVM does not support the compare-and-swap instruction. A Java program can only access the native instructions of the processor by using the Java Native Interface (JNI). The socket library provided by the traditional C interface made the process of porting the code from Java to C++ very difficult. This prompted the development of a socket library in C++ to facilitate the development of the server using wait-free synchronization. The section of the code that used the compare-and-swap instruction had to be written in C++ or C. This fact led to the implementation of the server in the UNIX operating system using C++. The biggest problem in developing the C++ implementation of the server is the socket library provided in the UNIX API [JOY86]. 23

PAGE 35

24 4.1 UNIX C Socket Library The venerable socket library in UNIX is the original de facto standard that influences all socket libraries. Bill Joy originally implemented the socket library in the Berkeley UNIX OS [JOY86]. This library was implemented in classic C. A sample C program that uses this library is in Figure 4-1. The code can be further divided in the section that initializes the socket and the code that actually uses the socket for communication. What is obvious from this code is the following: The initialization of sockets (line 15 to line 32) must be very precise. This makes programming sockets an error prone activity. If there are errors in the use of sockets, the programmer must explicitly handle them. Error handling is contained in line 27 to line 31. This can lead to an increase of size and complexity of the code. These problems led to the development of a socket library in C++. It supports the goals of this dissertation by providing an abstraction to implement network communication which would facilitate the development and implementation of a server that uses wait-free synchronization. 4.2 C++ Socket Library The desire to simplify the development of applications that use the socket library led to its development. The goals of the development of the socket library are as follows: Increase the reliability of the programs. Make initialization issues as abstract as possible for the programmer. Simplify the use of sockets. In developing this library I used a multi-paradigm approach employing the object-oriented and generic programming paradigms.

PAGE 36

25 1 #include 2 #include 3 #include 4 #include 5 #include 6 7 int main() 8 { 9 int sockfd; 10 int len; 11 struct sockaddr_un address; 12 int result; 13 char ch = 'A'; 14 15 /* socket initialization */ 16 /* Create socket for client */ 17 sockfd = socket(AF_UNIX, SOCK_STREAM, 0); 18 19 /* Name the socket as agreed with the server */ 20 address.sun_family = AF_UNIX ; 21 strcpy(address.sun_path, "server_socket"); 22 len = sizeof(address); 23 24 /* Now connect our socket to the server's socket */ 25 result = connect(sockfd, (struct sockaddr*) &address, len); 26 27 if (result == -1) 28 { 29 perror("oops: client1"); 30 exit(1); 31 } 32 /* end of socket initialization */ 33 34 /* We can now read and write via: sockfd */ 35 write(sockfd, &ch, 1); 36 read(sockfd, &ch, 1); 37 printf("char from server = %c\n", ch); 38 close(sockfd); 39 exit(0); 40 } Figure 4-1: Socket program using the traditional C library The object-oriented paradigm is a popular programming paradigm that has the following attributes:

PAGE 37

26 Encapsulation: This is a mechanism that allows data types and the functions for a given object to be contained in a unit. This mechanism is implemented using either class or module. This allows the programmer to create new types. Inheritance: Inheritance characterizes the relationship of classes. There must be an explicit is-a relationship between classes. For example, a human is-a homosapien, an airplane is-a vehicle, a rose is-a plant, etc. Classes that are specialized inherit from classes with more generalization. Polymorphism (Dynamic binding): Dynamic binding allows the implementation of generalized classes. A generalized class can be created without knowing in advance the specialized classes. At execution time, the program can determine the specific class to be used. One of the major weaknesses of the object-oriented paradigm is that dynamic binding is performed at execution time. This slows the execution of programs. This is not acceptable for system programming in which performance is crucial. The generic programming technique allows the implementation of binding at compile time. Using these two approaches leads to a multi-paradigm design that led to an efficient design and implementation of software systems. The socket library follows the design shown in Figures 4-2, 4-3, and 4-4. To facilitate its development, the socket library is divided into three sections handling exceptions, the protocols, and the socket classes. The exception section contains a generalized SocketExceptions class and specialized socket exceptions. The generalized class implements two functions: message, which returns a string message that displays an error message, and what,

PAGE 38

27 Figure 4-2: UML diagram from exception classes

PAGE 39

28 Figure 4-3: UML diagram for protocol classes

PAGE 40

29 Figure 4-4: UML diagram for socket classes

PAGE 41

30 which is called whenever an exception is raised. The what function is written as a virtual function, and is defined in the generalized class, but it is in the specialized class where the function is actually defined. The specialized exceptions classes are as follows: SocketException AcceptException ConnectException HostnameException BindException SocketnameException ListenException IOException A SocketException arises when a declaration error occurs. The Other exception are specialized instances of SocketException that arise in specific circumstances that their name imply. The ProtocolImpl class is designed to implement the protocols to be used for communication. The most popular current communication protocol is Internet Protocol Version 4, commonly known as IPV4 [STE98]. The ProtocolImpl class implements the getDomain function that returns the value of domain. For the example of a specialized class is the ipv4. The functions defined in the class that supports the IPV4 protocol are: ipv4(): constructor function. void setDomain(int x = AF_INET): Sets the domain type.

PAGE 42

31 int constr_name(const char *hostnm, int port): Builds a internet socket name based on a hostname and a port number. char *ip2name(): Converts an IP address to a character string host name. SOCKET getsockname(SOCKET sid, int len): Obtains the socket name. in_port_t getPort(): Obtains the port number. void portUpd(int *port_p): Updates the port. SOCKET bind(SOCKET sid, int len): Assigns a UNIX or an internet name to a socket. SOCKET accept(SOCKET sid): Accepts a client connection request. SOCKET connect(SOCKET sid, int len): Accepts a client connection request. int recvfrom(SOCKET sid, char* buf, size_t len, int flag): Reads a message using a datagram int sendto(SOCKET sid, const char* buf, size_t len, int flag, int size): Writes a message using a datagram The Socket class inherents from two classes: SocketSuper and SocketPlatform. The SocketSuper class is used to implement the naming functions that enables binding at compile time. The functions implemented by this class are the self function that returns the SocketType and the change function that is designed to allow an object to change its SocketType. The SocketPlatform is designed to implement all the platform specific functions. The functions implemented in this class are: SocketPlatform(): Constructor. int getpid(): Function that gets the process id. bool platformInit(): Function that initializes the socket.

PAGE 43

32 void platformClose(SOCKET sid): Function that closes the socket. void platformCloseSocket(SOCKET sid): Function that closes the socket and frees the resources. bool platformSocketError(SOCKET rc): Function that returns an error message. bool platformInvalidSocket(SOCKET rc): Function that determines wheter a socket is invalid. The Socket class is implemented with the following public functions: Socket(): Constructor that calls the init() function. Socket(): Destructor that frees all used resources. void init(): Function that binds to the init function from the specialized class. int SocketId(): Function that returns a socket id. void bind(const string address, int port = 0): Function that binds an address to a socket. void connect(const std::string hostnm, int port = -1): Function that actively attempt to establish a connection. void close(): Function that releases a socket connection. void setBuffer(int x): Function that sets the size of the buffer. int bufferSize(): Function that returns the size of the buffer. void write(const string buf): Function that binds to the write function from a specialized class. This function sends a message through a socket. void read(string& buf): Function that binds to the read function from a specialized class. This function reads a message from a socket. int shutdown(int mode = 2): Function that shuts down the connection of a socket.

PAGE 44

33 From the socket library two specialized classes were implemented. These are the StreamSocket, with a specialized class called ServerStreamSocket, and DatagramSocket. The StreamSocket class uses the Transfer Control Protocol (TCP, also known as stream-based sockets). TCP provides reliability and guarantees that the data will get to its destiny. The StreamSocket class provides the following functions: StreamSocket(): Constructor. void init(): Function that initializes a socket. void bind(const char* address = NULL, int port = 0): Function that assigns a UNIX or an internet name to a socket. int accept(std::string& address, int* port_p): Block caller until a connection request arrives. void write(const string buf): A function that writes a message to a connected stream socket. void read(string& buf): A function that reads a message to a connected stream socket. A ServerStreamSocket is a specialization of the StreamSocket class. This class only contains a constructor. This constructor performs the same function as initializing a StreamSocket object with the server parameters. The DatagramSocket class implements the User Datagram Protocol (UDP). This protocol does not guarantee that the packets will be delivered nor that they will arrive in the order sent by the originator. The main advantage is that this protocol does not consume as many resources as TCP, and it is much faster than TCP. The functions provided by this class are: DatagramSocket(): Constructor.

PAGE 45

34 void init(): Function that initializes a socket. void bind(const char* address = NULL, int port = 0): Function that assigns a UNIX or an Internet name to a socket. void write(const string buf): Function that writes a message to a connected datagram socket. void read(string& buf): Function that reads a message to a connected datagram socket. The source code for the socket library is located in Appendix B. A sample C++ program that uses this library is shown in Figure 4-5.

PAGE 46

35 1 #include 2 #include 3 #include 4 #include "Socket.h" 5 6 using namespace std; 7 8 int main(int argc, char* argv[]) 9 { 10 const string MSG1 = "Hello MSG1"; 11 string buf; 12 13 int port = -1; 14 if (argc < 2) 15 { 16 cerr << "usage: " << argv[0] 17 << " []\n"; 18 return 1; 19 } 20 21 // check if port no of a socket name is specified 22 sscanf(argv[1], "%d", &port); 23 24 // 'host' may be a socket name of a host name 25 char *host = (port == -1) ? argv[1] : argv[2]; 26 27 try { 28 // create a client socket and connect it 29 net::StreamSocket socket(host, port); 30 31 // send MSG1 to a server socket 32 socket.write(MSG1); 33 34 // read MSG2 from server 35 socket.read(buf); 36 cout << "Client: received msg using read: " << buf 37 << endl; 38 39 // shut down socket explicitly 40 socket.shutdown(); 41 } 42 catch(net::SocketExceptions& e) 43 { 44 cout << e.what() << endl; 45 } 46 } Figure 4-5: Socket program using the C++ socket library

PAGE 47

CHAPTER 5 DISTRIBUTED SYNCHRONIZATION The synchronization techniques we have presented in Chapter 3 are based on using shared variables. Therefore, they can be implemented in computer systems with hardware in which shared memory is used. Distributed systems are now common. They include distributed-memory multicomputers as well as networks of workstations. In a distributed system, processors have their own private memory and they interact using messages over a network rather than sharing memory. Concurrent programs that employ message passing are called distributed programs, because the threads can be distributed across multiple processors that are or are not physically connected. This chapter presents the use of wait-free synchronization in the design of distributed applications. 5.1 Distributed Hash Table A hash table is an effective and popular data structure used for many applications. Hash tables are useful for applications requiring a dynamic set of operations such as insert, search, and delete. For example, a compiler for a programming language maintains a symbol table, in which the keys of the elements are character strings that correspond to reserved words in the language. Another example is the implementation of a Database Management System (DBMS) [GAR00]. A DBMS maintains a set of records, in which keys are assigned to each entry for efficient access of the records. A 36

PAGE 48

37 hash table in a database is considered the best technique for an exact match search. Unlike B-trees, it is possible to implement a hash table that can be modified with a single lock. Even though searching for a given element in a hash table can take as long as (n) in the worst case, in practice hash tables perform extremely well. Normally the expected time for an element search in a hash table is O(1). Dimitrovsky [DIM86] introduced a parallel hash table that uses a group lock. A group lock is not really a lock in the same sense as a binary semaphore or readers/writers lock. The primary operations are lock and unlock, but the number of processors allowed to access a shared resource is not controlled in the same way as other kind of locks. The group lock [DIM86, DIM88, DIM91] delays the caller if necessary, until a new group is created. A group is a collection of processors that have not yet executed the unlock operation. Only one group at a time is allowed to execute, and any other processor using the lock operation must wait to join a later group after the current one completes its task. The parallel hash table has the usual advantages and disadvantages of a serialized hash table: The average number of items examined in a search is O(1), but the worst case is (n) for n items. The hash table size must be determined in advance. Wood [WOO89] presented the hq algorithm, which allows the hash table to be extensible. For practical purposes this algorithm is not popular due to its additional complexity and dynamic storage management requirements. It should be noted that when a process needs to modify the hash table, the whole table becomes serialized. Edler [EDL95] implemented a parallel hash table in the Symunix operating system. The implementation of this algorithm was also done using a group lock.

PAGE 49

38 The research for this dissertation began as an effort to implement a parallel hash table that used “wait-free synchronization.” It was noted that only very simple operations could be achieved. It was also noted that a re-design of the parallel hash table using a lock for each bucket could lead to a better solution to the problem of bottlenecks. This led to the notion of implementing a hybrid-design of a readers/writers solution that utilizes both locks and wait-free synchronization. To accommodate the nature of the hybrid readers/writers implementation, a new formulation of the problem is required. The traditional readers/writers problem has the following rules: Only one writer is permitted to execute at a time. Reading is not permitted while a writer is executing. Multiple readers can execute asynchronously. The reformulated readers/writers problem adds these new rules: For complex writing operations, such as deleting a link in the hash table, a writer lock is used. For simple operations, such as incrementing a variable, a reader lock is used and wait-free synchronization is used to modify the variable. 5.2 The Hybrid Readers and Writers Implementation There are essentially three classes that are needed to implement the parallel hash table: The links: This class implements each data item of the list. Each data element in the hash table contains an object of this type that includes the key, the shared variable called counter and a pointer to the next object of type Link. The Link class is shown in Figure 5-1. The readers/writers lock: This class implements the readers and writers lock. The public interface provides the read and write operations using locks and the increment and decrement operations using wait-free synchronization. This class is shown in Figure 5-2.

PAGE 50

39 class Link { public: int iData; // data item (key) int counter; // counter Link* pNext; // next link in list }; // end class Link Figure 5-1. Link Class class RWLock { pthread_mutex_t mutex; pthread_cond_t read; // wait for read pthread_cond_t write; // wait for write int valid; // set when valid int r_active; // readers active int w_active; // writer active int r_wait; // readers waiting int w_wait; // writers waiting static const int RWLOCK_VALID = 0xfacade; void readCleanup(); void writeCleanup(); int readTryLock(); int writeTryLock(); public: RWLock(); ~RWLock(); int readLock(); int readUnlock(); int writeLock(); int writeUnlock(); bool wfInc(Link &link, int x); bool wfDec(Link &link, int x); }; Figure 5-2. RWLock Class The sorted list: A collection of links placed in order: class SortedList { Link *pFirst; // ref to first list item RWLock rw; }; Figure 5-3. SortedList Class The hash table: The class for hash table encapsulates the classes previously discussed as members of the class. Among them is a vector of SortedList and threads of execution:

PAGE 51

40 class HashTable { vector hashArray; // vector of // lists thread *thread; int numThreads; int status; // status of // threads int arraySize; }; // end of class HashTable Figure 5-4. HashTable Class When a hash table object is created, a thread of execution is created for each bucket in the hash table. For synchronization, a reader writer lock is provided. This reader and writer lock provides the traditional locks, plus two separate functions used for the wait-free synchronization. As it was previously discussed the compare-and-swap instruction is the backbone of wait-free synchronization. The use of the compare-and-swap instruction in wait-free algorithms is simple and straightforward. For example, the implementation of the increment of the counter without using locks would be as follows: bool RWLock::wfInc(Link &link, int x) { int old = link->counter; int result = old + x; if (temp == CAS(link->counter, old, result)) return true; return false; } Figure 5-5. Wait-free increment function Another subtle weakness of wait-free algorithms is in the realm of parallel programming. This is due to the fact that each processor can execute an independent compare and swap instruction. For one processor, there is a guarantee that only one thread of execution will execute the compare-and-swap instruction atomically. But with

PAGE 52

41 two or more processors, two compare and swap instructions can execute concurrently. Figure 5-6 demonstrates an example of a single list shared by two processors. CPU 1 CPU 2 key 2 5 12 13 19 counter 100 29 7 60 9 Figure 5-6. Shared list If the scheme of Figure 5-6 is used on the code shown in Figure 5-7, both compare and swap instructions can run simultaneously and create a race condition. The possible values for linkcounter are: 65, 70, or 75. A possible solution to this problem is to implement a partial shared memory. The hashed synchronization table solves this problem. Figure 5-8 shows how the hash table would handle the data. In Figure 5-8 there are 4 buckets in the hash table being processed in a dual processor architecture. Each processor has 2 buckets, and each data item is hashed using the key. CPU 1 has exclusive access to lists 0 and 2, and CPU 2 has exclusive access to lists 1 and 3. By the nature of this design, the problem that arises from the configuration in Figure 5-8 is solved and the race condition is completely eliminated because only one processor has exclusive access to each data item.

PAGE 53

42 CPU 1 // linkkey = = 13 do { int old = linkcounter; int result = old + 5; int test = CAS(link->counter, old, result); } while(old = = test); CPU 2 // linkkey = = 13 do { int old = linkcounter; int result = old + 10; int test = CAS(link->counter, old, result); } while (old == test); Figure 5-7. Sample code 5.3 Client and Server Implementation Issues The original implementation of the distributed simulation was as follows: The client sends a request for a value to modify. The server sends the value and the client performs the computation. After that, the client sends the values to perform the compare-and-swap operation for wait-free synchronization. The server would execute the compare-and-swap instruction and returns the results. The client request is completed if it receives a message that the operation is successful. If it is not successful, the operation is repeated. The implementation of this model is shown in Figure 5-9. One of the drawbacks of this design is that the client must perform all the complex operations of wait-free synchronization. But the most subtle and most dangerous problem is that this design leads to heavy traffic contention which can lead to message collision. These collisions adversely affected system performance, often leading to a system crash.

PAGE 54

43 CPU 1 CPU 2 key 2 5 12 13 19 counter 100 29 7 60 9 hash 2 hash 0 key counter hash 1 key counter key counter hash 3 Figure 5-8. Hashed table This problem led to a different approach to the implementation. The designs of the client and server are based on the notion that the server gets the request to perform an operation to the shared variable. In this case the request operation is an increment by a certain amount. If the example, is unsuccessful, the server repeats the operation of attempting to modify the variable. Once it succeeds, it communicates to the client that the operation was a success. The implementation of this model is shown in figure 5-10.

PAGE 55

44 void server() { string message; int key; int old, new, result; message = receive-message(); key = atoi(message); old = hash.value(key); send-message(old); message = recive-message(); new = atoi(message); result = CAS(hash.value(key), old, new); if (result == old) send-message(true); else send-message(false); } void client() { string message; int key; int value; do { send-message(key); message = receive-message(); value = atoi(message); value = value + 1; send-message(value); message = receive-message(); } while (message == false); } Figure 5-9. First implementation of readers and writers using wait-free synchronization 5.4 Studies and Results The implementation of the distributed hash table was done in ISO C++ and the source code is located in Appendix C. The software was compiled and implemented using Red Hat Linux 7.3 and Mac OS X 10.1. The Linux OS ran on a dual processor Intel Pentium III 1 Ghz and the technical specifications of this computer are in Table 5-1. The Mac OS

PAGE 56

45 X implementation ran on a dual processor Power PC G4 1 Ghz and the technical specifications of this computer are in Table 5-2. It is of interest to note the difference in the implementation of the compare-and-swap instruction on the Intel Pentium 3 and the Motorola Power PC G4. void server-increment() { string message; int key; int old, new, result, increment; message = receive-message(); key = atoi(message); send-message(ACK); message = recive-message(); new = atoi(message); do { old = hash.value(key); result = CAS(hash.value(key), old, new + old); } while (result != old); send-message(done); } void client() { string message; int key; int increment; send-message(key); message = receive-message(); // assign to increment the value to be added send-message(increment); message = receive-message(); } Figure 5-10. Modified implementation of readers and writers using wait-free synchronization Table 5-1. Linux System specifications Processor Dual Intel Pentium III 1 Ghz Memory 1 Gigabyte of Ram Motherboard ABIT VP-6 SCSI Controller Adaptec SCSI Card 39160 Hard disk 36Gbyte Ultra SCSI hard disk

PAGE 57

46 Table 5-2. Mac OS X System specifications Processor Dual Motorola Power PC G4 1 Ghz Memory 1.5 Gigabyte of Ram SCSI Controller Ultra SCSI Card Hard disk 72 Gbyte Ultra SCSI hard disk The Intel Pentium 3 processor is a Complex Instruction Set Computer (CISC) architecture. As part of the instruction set, it supports the compare-and-swap instruction, calling it cmpxchg (compare and exchange) [INT97]. The use of the instruction is as follows: bool cas(register int &source, register int old_value, register int new_value) { int result; asm volatile (" movl %0, %%eax; cmpxchg %1, %2; movl %%eax, %3" : "=m" (old_value), "=r" (new_value), "=m" (source), "=r" (result)); return (old_value == result) ? true: false; } Figure 5-11. The compare-and-swap instruction for the Intel architecture The value of old_value is stored in the register eax. Next the compare-and-swap instruction compares the value of the register in eax with the value of source. If they are identical, source is assigned the value of new_value. If the operation is unsuccessful, eax is assigned the value of new_value. The last instruction assigns the value of the register eax to the variable result. The Motorola Power PC G4 is a Reduced Instruction Set Computer (RISC) architecture. Given the reduced set instruction nature of the processor, the compare-and-swap instruction is not actually implemented in the instruction set. This is in keeping with the philosophy of RISC architectures, which is to implement only the most basic operations.

PAGE 58

47 If a user wishes to use a complex instruction, the architecture provides basic instructions that the user can use to implement the more complex instruction. IBM provides a web page [IBM99] that provides sample implementations of complex synchronization primitives, including the compare-and-swap instruction. The implementation of the compare-and-swap instruction in the Power PC is shown in Figure 5-12. Like the traditional compare-and-swap instruction, this group of instructions loads the values of source and old_value. If they are identical, source is assigned the value of new_value. If the operation is unsuccessful, old_value is assigned the value of new_value. The client was implemented in Java. The client would create threads and then communicate with the server requesting to modify a variable while executing a for loop. For our experimentation we performed low contention (2 clients), moderately low bool cas(int *source, register int old_value, register int new_value) { int temp; asm volatile(" loop: lwarx %0, 0, %1 # Load and reserve cmpw %0, %2 # Are the first two # operands equal? bneexit # skip if not equal stwcx. %3, 0, %1 # Store new value if # still reserved bneloop # Loop if lost # reservation exit: mr %2, %0" : # Return value from # storage "=&b" (temp): "r" (source), "Ir" (old_value), "r" (new_value)); return (temp == old_value) ? true : false; } Figure 5-12. The compare-and-swap instruction for the Power PC architecture

PAGE 59

48 contention (4 clients), moderately high contention (8 clients), and high contention (16 clients). For each set, we executed 10 iterations and averaged the time taken to perform the computations. The results are shown in Figures 5-13, 5-14, 5-15, 5-16, 5-17, 5-18, 5-19, and 5-20: The anomaly between the Mac OS X and Linux prompted careful study using getrusage [STE98]. The results derived from getrusage demonstrated that there were 202 page faults and 24 page reclaims in the execution of the server in Linux, while there were none in Mac OS X. Apparently the reason that Mac OS X outperformed Linux is that OS X has better mechanisms to handle memory and secondary memory than Linux. Although this does identify an area of improvement in the Linux OS, this difference is completely irrelevant to the issue of wait-free synchronization. Any improvement wait-free provides will apply to either system. 2 Threads Linux0100002000030000400005000060000700008000090000102030405060708090100IterationsTime (ms) Locks clock Hybrid clock Figure 5-13. Two threads of execution results in Linux

PAGE 60

49 2 Threads Mac OS X05001000150020002500102030405060708090100IterationsTime (ms) Locks clock Hybrid clock Figure 5-14. Two threads of execution results in Mac OS X 4 Threads Linux05000010000015000020000025000030000010203040506070809010IterationsTime (ms) Locks clock Hybrid clock Figure 5-15. Four threads of execution results in Linux

PAGE 61

50 4 Threads Mac OS X02000400060008000100001200014000102030405060708090100IterationsTime (ms) Locks clock Hybrid clock Figure 5-16. Four threads of execution results in Mac OS X 8 Threads Linux010000020000030000040000050000060000070000080000010203040506070809010IterationsTime (ms) Locks clock Hybrid clock Figure 5-17. Eight threads of execution results in Linux

PAGE 62

51 8 Threads Mac OS X0100002000030000400005000060000102030405060708090100IterationsTime (ms) Locks clock Hybrid clock Figure 5-18. Eight threads of execution results in Mac OS X 16 Threads Linux05000001000000150000020000002500000300000010203040506070809010IterationsTime (ms) Locks clock Hybrid clock Figure 5-19. Sixteen threads of execution results in Linux

PAGE 63

52 16 Threads Mac OS X020000400006000080000100000120000140000160000180000200000102030405060708090100IterationsTime (ms) Locks clock Hybrid clock Figure 5-20. Sixteen threads of execution results in Mac OS X The previous results were obtained using writers contending among each other. The study that followed was the contention and interaction between readers and writers. The next experiment was performed using 4 threads. For our experiments we performed the combinations of 1 writer and 3 readers, 2 writers and 2 readers, 3 writers and 1 reader, and 4 writers and 0 readers. The case of 4 readers was not tested because neither the writer spin-lock, nor the wait-free synchronization techniques were going to be used in that instance. These tests were conducted using Mac OS X and each thread ran for 100 iterations. Figures 5-21, 5-22, and 5-23 shows the results. What can be seen is that up to 2 writers, the performance between wait-free synchronization and the lock version were similar. When there were 3 or more writers, the writers that used wait-free synchronization performed significantly better than the writers that used locks. With one reader, both the wait-free version and lock versions were very similar. When there were 2 or more readers, the lock version of the reader performed better than the reader that used wait-free synchronization.

PAGE 64

53 02000400060008000100001200014000Time (ms) Locks clock(writers)Hybrid clock(writers)Writers4 Threads Mac OS X 1 writer / 3 readers 2 writers / 2 readers 3 writers/ 1 reader 4 writers / 0 readers 4 writers / 4 readers Figure 5-21. Writers results 05001000150020002500Time (ms) Locks clock(readers)Hybrid clock(readers)Readers4 Threads Mac OS X 1 writer/ 3 readers 2 writers / 2 readers 3 writers/ 1 reader 4 writers / 0 readers 4 writers / 4 readers Figure 5-22. Readers results

PAGE 65

54 4 Threads Mac OS X020004000600080001000012000140001 writer /3readers2 writers/ 2readers3 writers/1 reader4 writers/ 0readers4 writers/4readersReaders and Writers AverageTime (ms) Locks clock Hybrid clock Figure 5-23. Readers and writers average results Clearly wait-free synchronization outperforms lock synchronization for multiple writers. These results also show consistency with the implementation of the Linux low latency patch. Given that the code of the patch relied, in a few portions of the code, on wait-free synchronization, the contention of these functions was low. The performance of both locks and wait-free synchronization during small numbers of iterations were similar. Once there are many clients and/or iterations with multiple writers, wait-free synchronization outperforms traditional locks. 5.5 Databases In a typical relational database, relations are usually represented using one of four storage structures – B+-tree, ISAM, hash, and heap. Hash tables are an effective storage structure for many situations because Hashing does not incur the indexing overhead associated with structures such as B+-trees. The technique of hashing allows the database to access data without the need to lookup an index.

PAGE 66

55 To perform an insertion operation, the hashing algorithm processes a tuple’s key and then the tuple is inserted into the page identified by the calculation. If the find operation is used, the hash function processes the input key to calculate the page offset and then scans the keys on the identified page to find the record(s). This stands in contrast to indexing operations where possibly multiple levels of index are processed to identify the correct page on which the tuple(s) reside. For exact match searches, it can be clearly seen that hash tables require fewer resources because all the database must maintain is one table and the key is computed using a hash function. A distributed hash table can be implemented in databases that utilize uniprocessors, multiprocessors, and in distributed systems. The main applications in a database are searching, such as a web-search engine, and modifying, such as an airline reservation. A web search engine that performs searches will not benefit from wait-free synchronization. On the other hand, an airline reservation system that performs the operation of reserving seats is an example of a system that can benefit from wait-free synchronization. In the airline reservation system it is likely it will spend approximately 20% of the computational time performing modification to a seat reservation variable [SIL01]. Our studies in the use of wait-free synchronization in a distributed hash table show that there exists the potential that a database can benefit from wait-free synchronization. Wait-free synchronization would be ideal in the scenario of a database that has a small number of writers in which the use of locks would degrade the performance of the system. For complex operations on a database, such as modifying multiple rows, perform all or none operations in multiple tuple’s, or deleting tuple’s on multiple pages, locks need to be used within the database.

PAGE 67

CHAPTER 6 CONCLUSION We have identified that wait-free synchronization is a technique that merits further research and application in real software development. There was no performance gain or penalty for using wait-free synchronization in the Linux Kernel. The real gain is that the modified patch using the hybrid design increases the reliability of the code by making it less likely that a system be brought into a deadlock state if a process dies while holding a lock. Even though these are small sections of code in which a crash is unlikely, this reliability gain should not be ignored. This is especially true since there are no performance penalties associated with the use of wait-free synchronization. The performance in the distributed application of both locks and wait-free synchronization during small numbers of iterations were similar. Once there were a high number of clients and/or iterations with multiple writers, wait-free synchronization outperformed traditional locks. It should be of interest to note that in the Power PC architecture, which does not provide a native compare-and-swap instruction, wait-free synchronization still outperformed the traditional use of locks. In the application of these ideas in software development it can be noted that the hash table is an ideal candidate to use wait-free synchronization. Note that unlike b-trees which require multiple locks, it is possible to implement a hash table can be modified with a single lock. A hash table is an efficient data structure that can be used for efficient file input and output. Also, using hash tables is the best technique for performing exact match searches in a database. The 56

PAGE 68

57 benefits of wait-free synchronization in the design of these software systems should not be ignored. It is our belief that this is an important tool that, if used properly, can increase the reliability and the fault tolerance of software systems without any performance penalties. In systems of high contention, wait-free synchronization will provide better performance to the traditional lock synchronization. Future Research. Future research in this area will consist of continuing to study the role of wait-free synchronization in the implementation of concurrent systems. It should be of interest to continue examining the role of wait-free synchronization in the implementation of more complex operations for the compare-and-swap instruction. It should be of interest to note that in the Power PC architecture, which does not provide a native compare-and-swap instruction, wait-free synchronization still outperformed the traditional use of locks. With the advent of object-oriented languages, it is quite feasible to implement a class that includes shared variables and, within this class, a software implementation of the compare-and-swap instruction. An example is shown in Figure 6-1. This type of data structure will make it feasible to be able to use wait-free synchronization in the implementation of real software systems. The reason for this is that when a lock is being held, most of the overhead is caused by operations that are performed to the data types. If the operations are limited to compares and assignments, it is our belief that contention for resources will be reduced and system performance will increase.

PAGE 69

58 class shared { private: void copy(shared new) { Int = new.Int; Char = new.Char; Double = new.Double; } public: int Int; char Char; double Double; bool CAS(shared copy, shared new) { lock(); if ((Int == copy.Int) && (Char == copy.Char) && (Double == copy.Double)) { copy(new); unlock(); return true; } unlock(); return false; } }; Figure 6-1. Sample code

PAGE 70

APPENDIX A LINUX LOW LATENCY PATCH --kernel/sched.c @@ -1288,3 +1299,74 @@ void __init sched_init(void) atomic_inc(&init_mm.mm_count); enter_lazy_tlb(&init_mm, current, cpu); } + +#if LOWLATENCY_NEEDED +#if LOWLATENCY_DEBUG + +static struct lolat_stats_t *lolat_stats_head; +static spinlock_t lolat_stats_lock = SPIN_LOCK_UNLOCKED; + +void set_running_and_schedule(struct lolat_stats_t *stats) +{ +/* spin_lock(&lolat_stats_lock); + if (stats->visited == 0) { + stats->visited = 1; + stats->next = lolat_stats_head; + lolat_stats_head = stats; + } + stats->count++; + spin_unlock(&lolat_stats_lock); + + if (current->state != TASK_RUNNING) + set_current_state(TASK_RUNNING); + schedule(); */ + int temp; + + if (stats->visited == 0) { + stats->visited = 1; + stats->next = lolat_stats_head; + lolat_stats_head = stats; + } + stats->count++; + spin_unlock(&lolat_stats_lock); */ + + if (cas(&stats->visited, 0, 1)) + { + stats->next = lolat_stats_head; + lolat_stats_head = stats; + init = 1; + } + else + { 59

PAGE 71

60 + while (init == 0) + ; + } + do { + temp = stats->count; + } + while(cas(stats->count, temp, temp+1)); + if (current->state != TASK_RUNNING) + set_current_state(TASK_RUNNING); + schedule(); +} --mm/shmem.c @@ -30,6 +30,7 #include +#include "atomic.h" @@ -12,6 +12,7 @@ shmem_delete_inode(struct inode * inode) static void shmem_delete_inode(struct inode * inode) { + int temp; struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); @@ -19,6 +19,7 @@ shmem_delete_inode(struct inode * inode) shmem_truncate (inode); } spin_lock (&sbinfo->stat_lock); sbinfo->free_inodes++; spin_unlock (&sbinfo->stat_lock); + do { + temp = sbinfo->free_inodes; + } + while(cas(&sbinfo->free_inodes, temp, temp + 1)); @@ -261,6 +261,7 @@ shmem_recalc_inode(struct inode * inode) unsigned long freed; + unsigned long temp; freed = (inode->i_blocks/BLOCKS_PER_PAGE) (inode->i_mapping->nrpages + SHMEM_I(inode)->swapped); if (freed){ struct shmem_sb_info * sbinfo = SHMEM_SB(inode->i_sb); inode->i_blocks -= freed*BLOCKS_PER_PAGE; spin_lock (&sbinfo->stat_lock); sbinfo->free_blocks += freed; spin_unlock (&sbinfo->stat_lock); + do { + temp = sbinfo->free_blocks += freed; + }

PAGE 72

61 + while(cas(&sbinfo->free_blocks, temp, temp + freed)); }

PAGE 73

APPENDIX B SOCKET LIBRARY // Socket.h // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #ifndef SOCKET_H #define SOCKET_H #include #include #include #include #include #include #include #include // mac OS X #include #include "platform.h" #ifndef in_port_t typedef u_int16_t in_port_t; #endif // mac OS X end namespace net { static const int BOOST_BACKLOG_NUM = 5; class ProtocolImpl { protected: int domain; public: int getDomain() { return domain; } }; class ipv4 : public ProtocolImpl { sockaddr_in addr; 62

PAGE 74

63 public: // Constructor ipv4() { setDomain(); } void setDomain(int x = AF_INET) { domain = x; } // Build a internet socket name based on a hostname and a port # int constr_name(const char *hostnm, int port); // Convert an IP address to a character string host name char *ip2name(); // get socket name SOCKET getsockname(SOCKET sid, int len); // get port number in_port_t getPort() { return addr.sin_port; } // port update void portUpd(int *port_p); // assign a UNIX or an internet name to a socket SOCKET bind(SOCKET sid, int len); // A server socket accepts a client connection request SOCKET accept(SOCKET sid); // A server socket accepts a client connection request SOCKET connect(SOCKET sid, int len); // reads a message using a datagram int recvfrom(SOCKET sid, char* buf, size_t len, int flag); // wrates a message using a datagram int sendto(SOCKET sid, const char* buf, size_t len, int flag, int size); }; // define the exception classes class SocketExceptions : public exception { protected: const char* message(const char *on_what) { std::string temp = on_what; temp += ": "; temp += strerror(errno); return temp.c_str(); } public: virtual const char* what() = 0; };

PAGE 75

64 class SocketException : public SocketExceptions { public: const char* what() { return message("Socket::SocketException"); } }; class AcceptException : public SocketExceptions { public: const char* what() { return message("Socket::AcceptException"); } }; class ConnectException : public SocketExceptions { public: const char* what() { return message("Socket::ConnectException"); } }; class HostnameException : public SocketExceptions { public: const char* what() { return message("Socket::HostnameException"); } }; class BindException : public SocketExceptions { public: const char* what() { return message("Socket::BindException"); } }; class SocknameException : public SocketExceptions { public: const char* what() { return message("Socket::SocknameException"); } }; class ListenException : public SocketExceptions { public: const char* what() { return message("Socket::ListenException"); } }; class IOException : public SocketExceptions { public: const char* what() { return message("Socket::IOException"); } }; template class SocketSuper { protected: SocketSuper() {

PAGE 76

65 } SocketType& Self() { return static_cast(*this); } template void change(T* old) { old = this; } }; template class Socket : public SocketSuper, public SocketPlatform { SOCKET _id; // socket id int domain; // socket domain int socktype; // socket type protected: SOCKET sid; // socket descriptor SOCKET rc; // member function return status code int bufLength; // buffer length int protocol; // protocol Protocol ipv; // protocol pointer (ipv4 or ipv6) std::string daddress; // datagram address // Build a UNIX domain name based on a pathname int constr_name(sockaddr &addr, const char* const Pathnm); void SocketInit(int type, int prot); void SocketBind(const char* name = NULL, int port = 0); Socket(int type, int prot); public: // constructor Socket() { init(); } template Socket(Socket& s) { init(); change(&s); } // destructor ~Socket(); // discard a socket void init() { Self().init(); } // return a socket's id # int SocketId(); // assign a UNIX or an internet name to a socket void bind(const char* address = NULL, int port = 0); void bind(const std::string address, int port = 0); // A client initiates connection request to a server socket

PAGE 77

66 void connect(const char* hostnm, int port = -1); void connect(const std::string hostnm, int port = -1); // A client initiates connection request to a server socket //void connect(); // close connection void close(); // resize buffer length void setBuffer(int x); int bufferSize() { return bufLength; } // virtual functions for I/O void write(const char* buf) { Self().write(buf); } void write(const std::string buf) { Self().write(buf); } void read(char* buf) { Self().read(buf); } void read(std::string& buf) { Self().read(buf); } // shutdown connection of a socket int shutdown(int mode = 2); }; // class Socket template class StreamSocket : public Socket > { int _nsid; // socket id int _port; // socket port public: // constructor StreamSocket(); StreamSocket(const char* address, int port); StreamSocket(const std::string address, int port); void init(); // assign a UNIX or an internet name to a socket void bind(const char* address = NULL, int port = 0); // A server socket accepts a client connection request

PAGE 78

67 int accept(char* address, int* port_p); int accept() { return accept(0, 0); } int accept(std::string& address, int* port_p); // writes a message to a connected stream socket void write(const char* buf); void write(const std::string buf); // reads a message from a connected stream socket void read(char* buf); void read(std::string& buf); }; // class StreamSocket template class ServerStreamSocket : public StreamSocket { public: // constructor ServerStreamSocket(int port); }; template class DatagramSocket : public Socket > { int dflag; // datagram flag int dport; // datagram port public: // constructor DatagramSocket(); DatagramSocket(const char* address, int port); DatagramSocket(const std::string address, int port); void init(); // assign a UNIX or an internet name to a socket void bind(const char* address = NULL, int port = 0); // writes a message to a connected datagram socket void write(const char* buf); void write(const std::string buf); // reads a message from a connected datagram socket void read(char* buf); void read(std::string& buf); }; // class DatagramSocket // Build a UNIX domain name based on a pathname template int Socket ::constr_name(sockaddr& addr, const char* Pathnm) { addr.sa_family = domain; strcpy(addr.sa_data, Pathnm); return sizeof(addr.sa_family) + strlen(Pathnm) + 1;

PAGE 79

68 } // Socket constructor template Socket::Socket(int type, int prot) : socktype(type), bufLength(80), protocol(prot) { domain = -1; if (prot == -1) domain = AF_LOCAL; platformInit(); if (domain == AF_LOCAL) ipv.setDomain(AF_LOCAL); else domain = ipv.getDomain(); sid = socket(domain, socktype, protocol); if (platformInvalidSocket(sid)) throw SocketException(); } template void Socket::SocketInit(int type, int prot) { socktype = type; bufLength = 80; protocol = prot; domain = -1; if (prot == -1) domain = AF_LOCAL; platformInit(); if (domain == AF_LOCAL) ipv.setDomain(AF_LOCAL); else domain = ipv.getDomain(); sid = socket(domain, socktype, protocol); if (platformInvalidSocket(sid)) throw SocketException(); } // Socket destructor template Socket::~Socket() { shutdown(); platformCloseSocket(sid); } // assign a UNIX or an internet name to a socket template void Socket::SocketBind(const char* address, int port)

PAGE 80

69 { if (port == -1) { sockaddr addr; int len = constr_name(addr, address); rc = net::bind(sid, &addr, len); if (platformSocketError(rc)) throw BindException(); } else { int len = ipv.constr_name(address, port); rc = ipv.bind(sid, len); if (platformSocketError(rc)) throw BindException(); rc = ipv.getsockname(sid, len); if (platformSocketError(rc)) throw BindException(); } } // assign a UNIX or an internet name to a socket template void Socket::bind(const char* address, int port) { Self().bind(address, port); } // assign a UNIX or an internet name to a socket template void Socket::bind(const std::string address, int port) { bind(address.c_str(), port); } // A client initiates connection request to a server socket template void Socket::connect(const char* hostnm, int port) { if (port == -1) { struct sockaddr addr; int len = constr_name(addr, hostnm); rc = net::connect(sid, static_cast (&addr), len); if (platformSocketError(rc)) throw ConnectException(); } else { int len = ipv.constr_name(hostnm, port);

PAGE 81

70 rc = ipv.connect(sid, len); if (platformSocketError(rc)) throw ConnectException(); } _id = rc; } // A client initiates connection request to a server socket template void Socket::connect(const std::string hostnm, int port) { connect(hostnm.c_str(), port); } // close connection template void Socket::close() { platformClose(_id); } // resize buffer length template void Socket::setBuffer(int x) { bufLength = x; } // shutdown connection of a socket template int Socket::shutdown(int mode) { return net::shutdown(sid, mode); } // StreamSocket constructor template StreamSocket::StreamSocket() : Socket (SOCK_STREAM, 0), _nsid(-1), _port(-1) { } template void StreamSocket::init() { SocketInit(SOCK_STREAM, 0); _nsid = -1; _port = -1; } // StreamSocket constructor template

PAGE 82

71 StreamSocket::StreamSocket(const char* address, int port) : Socket (SOCK_STREAM, 0), _nsid(-1), _port(-1) { connect(address, port); } // StreamSocket constructor template StreamSocket::StreamSocket(const std::string address, int port) : Socket (SOCK_STREAM, 0) { connect(address.c_str(), port); } // StreamSocket constructor template void StreamSocket::bind(const char* address, int port) { _port = port; SocketBind(address, port); rc = listen(sid, BOOST_BACKLOG_NUM); if (platformSocketError(rc)) throw ListenException(); } // A server socket accepts a client connection request template int StreamSocket::accept(char* address, int* port_p) { if (!address) { _nsid = net::accept(sid, 0, 0); return _nsid; } if (!port_p, *port_p == -1) { sockaddr addr; int size = sizeof(addr); #ifdef linux rc = net::accept(sid, &addr, (socklen_t*) &size); #else rc = net::accept(sid, &addr, &size); #endif if (platformInvalidSocket(rc)) { strncpy(address, addr.sa_data, size); address[size] = '\0'; }

PAGE 83

72 } else { rc = ipv.accept(sid); if (platformInvalidSocket(rc)) { if (address) strcpy(address, ipv.ip2name()); if (port_p) *port_p = ntohs(ipv.getPort()); } } if (_port == -1) _nsid = -1; else _nsid = rc; if (platformInvalidSocket(rc)) throw AcceptException(); return rc; } // A server socket accepts a client connection request template int StreamSocket::accept(std::string& address, int* port_p) { std::vector temp; strcpy(&temp[0], address.c_str()); int x = accept(&temp[0], port_p); address = &temp[0]; return x; } // writes a message to a connected stream socket template void StreamSocket::write(const char* buf) { string temp; int len; int x; if (buf != NULL) { int len = strlen(buf) + 1; temp = buf; if (temp[len-1] != '\0') temp += "\0"; x = net::send(_nsid == -1 ? sid : _nsid, temp.c_str(), len, 0); } else x = net::send(_nsid == -1 ? sid : _nsid, " \n", 2, 0); if (platformSocketError(x)) throw IOException(); }

PAGE 84

73 // writes a message to a connected stream socket template void StreamSocket::write(const std::string buf) { write(buf.c_str()); } // reads a message from a connected stream socket template void StreamSocket::read(char* buf) { int x = net::recv(_nsid == -1 ? sid : _nsid, buf, bufLength, 0); if (platformSocketError(x)) throw IOException(); } // reads a message from a connected stream socket template void StreamSocket::read(std::string& buf) { std::vector temp(bufLength); int x = net::recv(_nsid == -1 ? sid : _nsid, &temp[0], bufLength, 0); if (platformSocketError(x)) throw IOException(); buf = &temp[0]; } // ServerStreamSocket constructor template ServerStreamSocket::ServerStreamSocket(int port) : StreamSocket() { bind(NULL, port); } // DatagramSocket constructor template DatagramSocket::DatagramSocket() : Socket (SOCK_DGRAM, 0) { dflag = 0; } // DatagramSocket constructor template void DatagramSocket::init() { SocketInit(SOCK_DGRAM, 0); dflag = 0; }

PAGE 85

74 // DatagramSocket constructor template DatagramSocket::DatagramSocket(const char* address, int port) : Socket(SOCK_DGRAM, 0) { dflag = 0; daddress = address; dport = port; } // DatagramSocket constructor template DatagramSocket::DatagramSocket(const std::string address, int port) : Socket(SOCK_DGRAM, 0) { dflag = 0; daddress = address; dport = port; } template void DatagramSocket::bind(const char* address, int port) { dport = port; if (address != NULL) { daddress.resize(strlen(address) + 1); daddress = address; } try { SocketBind(address, port); } catch(BindException& e) { SocketBind(address, 0); } } // writes a message to a connected datagram socket template void DatagramSocket::write(const char* buf) { int rc; int len = strlen(buf) + 1; if (dport == -1) { // UNIX domain socket sockaddr addr; int size = constr_name(addr, daddress.c_str()); rc = net::sendto(sid, buf, len, dflag, &addr, size); }

PAGE 86

75 else { // Internet domain socket if (daddress.empty()) { // use local host char temp[80]; if (gethostname(temp, sizeof temp) < 0) throw IOException(); hostent *hp = gethostbyname(temp); if(strstr(typeid(ipv).name(), "ipv4")) daddress = inet_ntoa(*(in_addr *) hp->h_addr_list[0]); } int size = ipv.constr_name(daddress.c_str(), dport); rc = ipv.sendto(sid, buf, len, dflag, size); } if (platformSocketError(rc)) throw IOException(); } // writes a message to a connected datagram socket template void DatagramSocket::write(const std::string buf) { write(buf.c_str()); } // reads a message from a connected datagram socket template void DatagramSocket::read(char* buf) { if (!dport || dport == -1) { // UNIX domain socket sockaddr addr; int size = sizeof(addr); #ifdef linux if ((rc = net::recvfrom(sid, buf, bufLength, dflag, &addr, (socklen_t *) &size)) > -1 && daddress.empty()) #else if ((rc = net::recvfrom(sid, buf, bufLength, dflag, &addr, &size)) > -1 && daddress.empty()) #endif { char *temp = new char[rc]; strncpy(temp, addr.sa_data, rc); temp[rc] = '\0'; daddress = temp; delete [] temp; } } else { // Internet domain socket rc = ipv.recvfrom(sid, buf, bufLength, dflag); if (!platformSocketError(rc))

PAGE 87

76 { if (!daddress.empty()) { daddress.resize(strlen(ipv.ip2name()) + 1); daddress = ipv.ip2name(); } if (dport > 0) // check this out { int t = dport; ipv.portUpd(&t); if (t != dport) dport = t; } } } if (platformSocketError(rc)) throw IOException(); } // reads a message from a connected datagram socket template void DatagramSocket::read(std::string& buf) { std::vector temp(bufLength); read(&temp[0]); buf = &temp[0]; } } #endif

PAGE 88

77 // Socket.C // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #include "Socket.h" #include namespace net { // Build a internet socket name based on a hostname and a port # int ipv4::constr_name(const char *hostnm, int port) { addr.sin_family = domain; if (!hostnm) addr.sin_addr.s_addr = INADDR_ANY; else { hostent *hp = gethostbyname(hostnm); if (hp == 0) throw HostnameException(); memcpy((char*)&addr.sin_addr, (char*)hp->h_addr, hp->h_length); } addr.sin_port = htons(port); return sizeof(addr); } // Convert an IP address to a character string host name char* ipv4::ip2name() { unsigned int laddr; if ((static_cast(laddr = inet_addr(inet_ntoa(addr.sin_addr)))) == -1) return 0; hostent *hp = gethostbyaddr((char*)&laddr, sizeof(laddr), AF_INET); if (!hp) return 0; for(char **p = hp->h_addr_list; *p != 0; p++) if (hp->h_name) return hp->h_name; return 0; } // assign a UNIX or an internet name to a socket SOCKET ipv4::bind(SOCKET sid, int len) {

PAGE 89

78 return net::bind(sid, (sockaddr*) &addr, len); } // get socket name SOCKET ipv4::getsockname(SOCKET sid, int len) { #ifdef linux return net::getsockname(sid, (sockaddr*) &addr, (socklen_t*) &len); #else return net::getsockname(sid, (sockaddr*) &addr, &len); #endif } // A server socket accepts a client connection request SOCKET ipv4::accept(SOCKET sid) { int size = sizeof(addr); #ifdef linux return net::accept(sid, (sockaddr*) &addr, (socklen_t*) &size); #else return net::accept(sid, (sockaddr*) &addr, &size); #endif } // A server socket accepts a client connection request SOCKET ipv4::connect(SOCKET sid, int len) { return net::connect(sid, (sockaddr*) &addr, len); } // reads a message using a datagram int ipv4::recvfrom(SOCKET sid, char *buf, size_t len, int flag) { int size = sizeof(addr); #ifdef linux return net::recvfrom(sid, buf, len, flag, (sockaddr*) &addr, (socklen_t*) &size); #else return net::recvfrom(sid, buf, len, flag, (sockaddr*) &addr, &size); #endif } // writes a message using a datagram int ipv4::sendto(SOCKET sid, const char *buf, size_t len, int flag, int size) { return net::sendto(sid, buf, len, flag, (sockaddr*) &addr, size); }

PAGE 90

79 // port update void ipv4::portUpd(int *port_p) { *port_p = ntohs(addr.sin_port); } }

PAGE 91

80 // platform.h // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #ifndef PLATFORM_H #define PLAFTORM_H #ifdef WIN32 namespace net { #include #include #define AF_LOCAL AF_UNIX typedef unsigned short in_port_t; class SocketPlatform { protected: SocketPlatform() { } // get process id int getpid() { return _getpid(); } // Socket init bool platformInit() { WORD wVersionRequested; WSADATA wsaData; int err; wVersionRequested = MAKEWORD(2, 2); err = WSAStartup(wVersionRequested, &wsaData); if (err != 0) return true; if (LOBYTE(wsaData.wVersion) != 2 || HIBYTE(wsaData.wVersion) != 2) { WSACleanup(); return true; } return false; } // close connection void platformClose(SOCKET sid)

PAGE 92

81 { closesocket(sid); } // close socket void platformCloseSocket(SOCKET sid) { closesocket(sid); WSACleanup(); } // socket error bool platformSocketError(SOCKET rc) { return (rc == SOCKET_ERROR); } // invalid socket bool platformInvalidSocket(SOCKET rc) { return (rc == INVALID_SOCKET); } }; } #else namespace net { #include #include #include #include #include #include #ifndef AF_UNIX #define AF_LOCAL AF_UNIX #endif typedef int SOCKET; class SocketPlatform { protected: SocketPlatform() { } // socket init bool platformInit() { return false; } // close connection void platformClose(SOCKET sid) { close(sid); }

PAGE 93

82 // close socket void platformCloseSocket(SOCKET sid) { close(sid); } // socket error bool platformSocketError(SOCKET rc) { return (rc == -1); } // invalid socket bool platformInvalidSocket(SOCKET rc) { return (rc == -1); } }; } #endif #endif // end of platform.h

PAGE 94

APPENDIX C CLIENT-SERVER SIMULATION // atomic.h // Linux version // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #ifndef ATOMIC_H #define ATOMIC_H bool cas(register int &source, register int old_value, register int new_value); #endif // atomic.C // Linux version // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #include "atomic.h" bool cas(register int &source, register int old_value, register int new_value) { int result; asm volatile ("movl %0, %%eax; cmpxchg %1, %2; movl %%eax, %3" : "=m" (old_value), "=r" (new_value), "=m" (source), "=r" (result)); return (old_value == result) ? true: false; } // atomic.h 83

PAGE 95

84 // Mac OS X version // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #ifndef ATOMIC_H #define ATOMIC_H bool cas(int *source, register int old_value, register int new_value); inline bool cas(int &source, register int old_value, register int new_value) { return cas(&source, old_value, new_value); } #endif // atomic.C // Mac OS X version // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #include "atomic.h" #include using namespace std; bool cas(int *source, register int old_value, register int new_value) { int temp; asm volatile("loop: lwarx %0, 0, %1 # Load and reserve cmpw %0, %2 # Are the first two # operands equal? bneexit # skip if not equal stwcx. %3, 0, %1 # Store new value if # still reserved bneloop # Loop if lost # reservation exit: mr %2, %0" : # Return value from

PAGE 96

85 # storage "=&b" (temp): "r" (source), "Ir" (old_value), "r" (new_value)); return (temp == old_value) ? true : false; } // hash.h // demonstrates hash table with separate chaining // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #ifndef HASH_H #define HASH_H #include #include #include // for random numbers #include // for random numbers #include #include "rwlock.h" #include "errors.h" #include "atomic.h" #include #include using namespace std; inline char* itoa(int x) { char *temp = new char; sprintf(temp, "%d", x); return temp; } class Link { //(could be other items) public: pthread_mutex_t mutex; // mutex variable for sem int key; // key int data; // shared variable Link* pNext; // next link in list Link(int it); // constructor string displayLink(); // display this link }; // end class Link class SortedList { Link *pFirst; // ref to first list item RWLock rw; public:

PAGE 97

86 SortedList(); // constructor void insert(Link* pLink); // insert link, in order void remove(int key); // delete key Link* find(int key); // find link bool boolFind(int key); // find link string displayList(); int dataVal(int key); bool modCAS(int key, int old_data, int new_data); int modLock(int key); void modUnlock(int key, int data); }; struct threads_t { int id; vector *hashArray; }; struct tInput { Link* link; int id; }; struct tOutput { int key; int id; Link* link; bool boolean; int value; bool done; int old_data; int new_data; int data; tOutput() { done = false; } bool isDone() { return done; } }; class HashTable { vector hashArray; // vector of lists pthread_t *thread; threads_t *tag; int arraySize; int numThreads; int status; public: HashTable() { } HashTable(int size, int tNum); // constructor ~HashTable() { void *thread_result; for(int x = 0; x < numThreads; x++) { cout << "thread " << x << ": terminated\n";

PAGE 98

87 status = pthread_join(thread[x], &thread_result); } } void init(int size, int tNum); void displayTable(); string displayList(int x); int listSize() { return arraySize; } int hashFunc(int key); int hashThread(int key); void insert(Link* pLink); //insert a dataItem void remove(int key); // remove a DataItem Link* find(int key); // find item with key Link* FIND(int key); bool boolFind(int key); // find item with key int dataVal(int key); bool modCAS(int key, int old_data, int new_data); int modLock(int key); void modUnlock(int key, int data); }; // end of class HashTable void *cell_thread(void *arg); #endif HASH_H

PAGE 99

88 // hash.C // demonstrates hash table with separate chaining // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #include "hash.h" #include "rwlock.h" #include "atomic.h" #include "hash-lock.h" #include #include #include using namespace std; vector > input; vector > remover; vector > finder; vector > caser; vector > locker; vector > unlocker; vector mutexLocker; vector mutexUnlocker; Link::Link(int it) : key(it), data(0) { int status; status = pthread_mutex_init(&mutex, NULL); if (status != 0) ; } string Link::displayLink() { string s(itoa(key)); s += "->"; s += itoa(data); s += " "; return s; } SortedList::SortedList() // constructor { pFirst = NULL; } void SortedList::insert(Link* pLink) // insert link, in order

PAGE 100

89 { int key = pLink->key; Link* pPrevious = NULL; // start at first rw.writeLock(); Link* pCurrent = pFirst; // start of sorting keys in order while (pCurrent != NULL && key > pCurrent->key) { // or pCurrent > key, pPrevious = pCurrent; pCurrent = pCurrent->pNext; // go to next item } // end of sorting the keys in order if (pPrevious == NULL) // if beginning of list, pFirst = pLink; // first -> new link else // not at the beginning pPrevious->pNext = pLink; // prev -> new link pLink->pNext = pCurrent; rw.writeUnlock(); } // end insert void SortedList::remove(int key) // delete key { // (assumes non-empty list) Link* pPrevious = NULL; // start at first rw.writeLock(); Link* pCurrent = pFirst; while (pCurrent != NULL && key != pCurrent->key) { // or key == current, pPrevious = pCurrent; pCurrent = pCurrent->pNext; // go to next link } // disconnect link if(pPrevious==NULL) // if beginning of list pFirst = pFirst->pNext; // delete first link else // not at beginning // delete current link pPrevious->pNext = pCurrent->pNext; rw.writeUnlock(); } Link* SortedList::find(int key) // find link { rw.readLock(); Link* pCurrent = pFirst; // start at first // until end of list, while(pCurrent != NULL && pCurrent->key <= key) { // or key too small if (pCurrent->key == key)// is this the link? { rw.readUnlock(); return pCurrent; // found it, return link } pCurrent = pCurrent->pNext; // go to next item }

PAGE 101

90 rw.readUnlock(); return NULL; } bool SortedList::boolFind(int key) // find link { rw.readLock(); Link* pCurrent = pFirst; // start at first // until end of list, while(pCurrent != NULL && pCurrent->key <= key) { // or key too small if (pCurrent->key == key) // is this the link? { rw.readUnlock(); return true; // found it, return link } pCurrent = pCurrent->pNext; // go to next item } rw.readUnlock(); return false; } string SortedList::displayList() { string s("List (key->data): "); rw.readLock(); Link* pCurrent = pFirst; // start at beginning of list while(pCurrent != NULL) // until end of list { s += pCurrent->displayLink(); // print data pCurrent = pCurrent->pNext; // move to next link } rw.readUnlock(); return s; } int SortedList::dataVal(int key) { rw.readLock(); Link* link = find(key); int data = link->data; rw.readUnlock(); return data; } bool SortedList::modCAS(int key, int old_data, int new_data) { rw.readLock(); Link* link = find(key); /*find*/ bool res = cas(link->data, old_data, new_data); rw.readUnlock(); return res; }

PAGE 102

91 int SortedList::modLock(int key) { rw.readLock(); Link* link = find(key); int status = pthread_mutex_lock(&link->mutex); if (status != 0) cout << "error: mutex lock\n"; return link->data; } void SortedList::modUnlock(int key, int data) { Link* link = find(key); link->data = data; int status = pthread_mutex_unlock(&link->mutex); rw.readUnlock(); if (status != 0) cout << "error: mutex unlock\n"; } void* cell_thread(void *arg) { int id = -1; threads_t *tag = static_cast(arg); int TAG = 0; while(true) { while ((input[tag->id].size() == 0) && (remover[tag->id].size() == 0) && (finder[tag->id].size() == 0) && (caser[tag->id].size() == 0) && (locker[tag->id].size() == 0) && (unlocker[tag->id].size() == 0)) sched_yield(); if (input[tag->id].size() != 0) { id = input[tag->id].front().id; tag->hashArray[0][id]->insert(input[tag->id].front().link); input[tag->id].pop(); } if (remover[tag->id].size() != 0) { id = remover[tag->id].front().id; tag->hashArray[0][id]->remove(remover[tag->id].front().key); remover[tag->id].pop(); } if (finder[tag->id].size() != 0) { id = finder[tag->id].front()->id; finder[tag->id].front()->link = tag->hashArray[0][id]->find(finder[tag->id].front()->key); finder[tag->id].front()->done = true;

PAGE 103

92 finder[tag->id].pop(); } if (caser[tag->id].size() != 0) { id = caser[tag->id].front()->id; caser[tag->id].front()->boolean = tag->hashArray[0][id]->modCAS (caser[tag->id].front()->key, caser[tag->id].front()->old_data, caser[tag->id].front()->new_data); caser[tag->id].front()->done = true; caser[tag->id].pop(); } if ((TAG == 0) && (locker[tag->id].size() != 0)) { pthread_mutex_lock(&mutexLocker[tag->id].lock); TAG = 1; pthread_mutex_unlock(&mutexLocker[tag->id].lock); id = locker[tag->id].front()->id; locker[tag->id].front()->value = tag->hashArray[0][id]->modLock (locker[tag->id].front()->key); locker[tag->id].front()->done = true; locker[tag->id].pop(); } if ((TAG == 1) && (unlocker[tag->id].size() != 0)) { pthread_mutex_lock(&mutexLocker[tag->id].lock); TAG = 2; pthread_mutex_unlock(&mutexLocker[tag->id].lock); id = unlocker[tag->id].front()->id; tag->hashArray[0][id]->modUnlock(unlocker[tag->id].front()->key, unlocker[tag->id].front()->value); unlocker[tag->id].front()->done = true; unlocker[tag->id].pop(); id = -1; pthread_mutex_lock(&mutexLocker[tag->id].lock); TAG = 0; pthread_mutex_unlock(&mutexLocker[tag->id].lock); } } } HashTable::HashTable(int size, int tNum) : arraySize(size), numThreads(tNum) // constructor { int status; arraySize = size; hashArray.resize(arraySize); // set vector size input.resize(arraySize); remover.resize(arraySize);

PAGE 104

93 finder.resize(arraySize); caser.resize(arraySize); locker.resize(arraySize); unlocker.resize(arraySize); mutexLocker.resize(arraySize); mutexUnlocker.resize(arraySize); for(int j = 0; j < arraySize; j++) // fill vector { hashArray[j] = new SortedList; // with list cout << "slot[" << j << "] is assinged to thread " << hashThread(j) << endl; } thread = new pthread_t[tNum]; // set thread vector size for(int j = 0; j < tNum; j++) { tag = new threads_t; tag->id = j; tag->hashArray = &hashArray; status = pthread_create(&thread[j], NULL, cell_thread, static_cast(tag)); if (status != 0) err_abort(status, "Create thread"); } } void HashTable::init(int size, int tNum) { arraySize = size; numThreads = tNum; int status; arraySize = size; hashArray.resize(arraySize); // set vector size input.resize(arraySize); remover.resize(arraySize); finder.resize(arraySize); caser.resize(arraySize); locker.resize(arraySize); unlocker.resize(arraySize); mutexLocker.resize(arraySize); mutexUnlocker.resize(arraySize); for(int j = 0; j < arraySize; j++) // fill vector { hashArray[j] = new SortedList; // with list cout << "slot[" << j << "] is assinged to thread " << hashThread(j) << endl; } thread = new pthread_t[tNum]; // set thread vector size for(int j = 0; j < tNum; j++) { tag = new threads_t;

PAGE 105

94 tag->id = j; tag->hashArray = &hashArray; status = pthread_create(&thread[j], NULL, cell_thread, static_cast(tag)); if (status != 0) err_abort(status, "Create thread"); } } void HashTable::displayTable() { for(int j = 0; j < arraySize; j++) // for each cell { cout << j << ". "; cout << hashArray[j]->displayList() << endl; // display list } } string HashTable::displayList(int x) { string s = itoa(x); s += ". "; s += hashArray[x]->displayList(); return s; } int HashTable::hashFunc(int key) { return key % arraySize; // hash function } int HashTable::hashThread(int key) { return key % numThreads; } void HashTable::insert(Link* pLink) //insert a dataItem { //(assumes table not full) int status; tInput in; in.link = pLink; int key = pLink->key; //extract key int hashVal = hashFunc(key); // hash the key in.id = hashVal; hashVal = hashThread(hashVal); input[hashVal].push(in); } // end insert() void HashTable::remove(int key) // remove a DataItem

PAGE 106

95 { tOutput out; out.key = key; int hashVal = hashFunc(key); // hash the keybd_event out.id = hashVal; hashVal = hashThread(hashVal); remover[hashVal].push(out); } // end remove() Link* HashTable::FIND(int key) // find item with key { int hashVal = hashFunc(key); // hash the key return hashArray[hashVal]->find(key); // get link } Link* HashTable::find(int key) // find item with key { tOutput *out = new tOutput; out->key = key; int hashVal = hashFunc(key); // hash the key out->id = hashVal; hashVal = hashThread(hashVal); finder[hashVal].push(out); while (out->isDone() == false) ; return out->link; // return hashArray[hashVal]->find(key); // get link } bool HashTable::boolFind(int key) // find item with key { int hashVal = hashFunc(key); // hash the key return hashArray[hashVal]->find(key); // get link } int HashTable::dataVal(int key) { int hashVal = hashFunc(key); return hashArray[hashVal]->dataVal(key); } bool HashTable::modCAS(int key, int old_data, int new_data) { tOutput *out = new tOutput; out->key = key; out->old_data = old_data; out->new_data = new_data; int hashVal = hashFunc(key); out->id = hashVal;

PAGE 107

96 hashVal = hashThread(hashVal); caser[hashVal].push(out); while (out->isDone() == false) ; return out->boolean; } int HashTable::modLock(int key) { tOutput *out = new tOutput; out->key = key; int hashVal = hashFunc(key); out->id = hashVal; hashVal = hashThread(hashVal); locker[hashVal].push(out); while (out->isDone() == false) ; return out->value; } void HashTable::modUnlock(int key, int data) { tOutput *out = new tOutput; out->key = key; out->value = data; int hashVal = hashFunc(key); out->id = hashVal; hashVal = hashThread(hashVal); unlocker[hashVal].push(out); }

PAGE 108

97 // rwlock.h // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #ifndef RWLOCK_H #define RWLOCK_H #include void readCleanup(void *arg); void writeCleanup(void *arg); class RWLock { pthread_mutex_t mutex; pthread_cond_t read; // wait for read pthread_cond_t write; // wait for write int valid; // set when valid int r_active; // readers active int w_active; // writer active int r_wait; // readers waiting int w_wait; // writers waiting static const int RWLOCK_VALID = 0xfacade; #define RWL_INITIALIZER \ {PTHREAD_MUTEX_INITIALIZER, PTHREAD_COND_INITIALIZER, \ PTHREAD_COND_INITIALIZER, RWLOCK_VALID, 0, 0, 0, 0} int readTryLock(); int writeTryLock(); public: RWLock(); ~RWLock(); int readLock(); int readUnlock(); int writeLock(); int writeUnlock(); void readcleanup(); void writecleanup(); }; #endif RWLOCK_H

PAGE 109

98 // rwlock.C // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #include "rwlock.h" #include #include // Initialize a read/write lock class RWLock::RWLock() { int status; r_active = 0; r_wait = 0; w_wait = 0; w_active = 0; status = pthread_mutex_init(&mutex, NULL); if (status != 0) ;//return status; status = pthread_cond_init(&read, NULL); if (status != 0) { pthread_mutex_destroy(&mutex); } status = pthread_cond_init(&write, NULL); if (status != 0) { pthread_cond_destroy(&read); pthread_mutex_destroy(&mutex); } valid = RWLOCK_VALID; } RWLock::~RWLock() { int status, status1, status2; if (valid != RWLOCK_VALID) ;//return EINVAL; status = pthread_mutex_lock(&mutex); if (status != 0) ;//return status; // check wether any threads own a lock; report "BUSY" if so if (r_active > 0 || w_active) {

PAGE 110

99 pthread_mutex_unlock(&mutex); //return EBUSY; } // check whether any thread are known to be waiting; // report EBUSY if so if (r_wait != 0 || w_wait != 0) { pthread_mutex_unlock(&mutex); //return EBUSY; } valid = 0; status = pthread_mutex_unlock(&mutex); if (status != 0) ;//return status; status = pthread_mutex_destroy(&mutex); status1 = pthread_cond_destroy(&read); status2 = pthread_cond_destroy(&write); } /* * Handle a cleanup when the read lock condition variable * wait is canceled. * * Simply record that the thread is no longer waiting, * and unlock the mutex. */ void RWLock::readcleanup() { r_wait--; pthread_mutex_unlock(&mutex); } void readCleanup(void *arg) { RWLock *temp = (RWLock *) arg; temp[0].readcleanup(); } /* * Handle cleanup when the write lock condition variable * wait is canceled. * * Simply record that the thread is no longer waiting, * and unlock the mutex. */ void RWLock::writecleanup() { w_wait--; pthread_mutex_unlock(&mutex); } void writeCleanup(void *arg)

PAGE 111

100 { RWLock *temp = (RWLock *) arg; temp[0].writecleanup(); } int RWLock::readLock() { int status; if (valid != RWLOCK_VALID) return -1;//EINVAL; status = pthread_mutex_lock(&mutex); if (status != 0) return status; if (w_active) { r_wait++; RWLock *temp = this; pthread_cleanup_push(readCleanup, (void *)temp); while (w_active) { status = pthread_cond_wait(&read, &mutex); if (status != 0) break; } pthread_cleanup_pop(0); r_wait--; } if (status == 0) r_active++; pthread_mutex_unlock(&mutex); return status; } /* * Attempt to lock a read/write lock for read access (don't * block if unavailable). */ int RWLock::readTryLock() { int status, status2; if (valid != RWLOCK_VALID) return -1;//EINVAL; status = pthread_mutex_lock(&mutex); if (status != 0) return status; if (w_active) status = -1;//EBUSY; else r_active++; status2 = pthread_mutex_unlock(&mutex); return (status2 != 0 ? status2 : status); }

PAGE 112

101 int RWLock::readUnlock() { int status, status2; if (valid != RWLOCK_VALID) return -1; // EINVAL; status = pthread_mutex_lock(&mutex); if (status != 0) return status; r_active--; if (r_active == 0 && w_wait > 0) status = pthread_cond_signal(&write); status2 = pthread_mutex_unlock(&mutex); return (status2 == 0 ? status : status2); } // Lock a read/write lock for a write access int RWLock::writeLock() { int status; if (valid != RWLOCK_VALID) return -1; // EINVAL; status = pthread_mutex_lock(&mutex); if (status != 0) return status; if (w_active || r_active > 0) { w_wait++; RWLock *temp = this; pthread_cleanup_push(writeCleanup, (void*)temp); while (w_active || r_active > 0) { status = pthread_cond_wait(&write, &mutex); if (status != 0) break; } pthread_cleanup_pop(0); w_wait--; } if (status == 0) w_active = 1; pthread_mutex_unlock(&mutex); return status; } /* * Attempt to lock a read/write lock for a write access. Don't * block if unavailable. */ int RWLock::writeTryLock() { int status, status2; if (valid != RWLOCK_VALID)

PAGE 113

102 return -1; //EINVAL; status = pthread_mutex_lock(&mutex); if (status != 0) return status; if (w_active || r_active > 0) status = -1; //EBUSY; else w_active = 1; status2 = pthread_mutex_unlock(&mutex); return (status != 0 ? status : status2); } // Unlock a read/write lock from write access. int RWLock::writeUnlock() { int status; if (valid != RWLOCK_VALID) return -1; // EINVALID status = pthread_mutex_lock(&mutex); if (status != 0) return status; w_active = 0; if (r_wait > 0) { status = pthread_cond_broadcast(&read); if (status != 0) { pthread_mutex_unlock(&mutex); return status; } } else if (w_wait > 0) { status = pthread_cond_signal(&write); if (status != 0) { pthread_mutex_unlock(&mutex); return status; } } status = pthread_mutex_unlock(&mutex); return status; }

PAGE 114

103 // server.C // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. #include #include #include #include #include // sleep #include "Socket.h" #include "hash.h" using namespace std; void *threadRunnable(void* arg); HashTable theHashTable; const char* True = "true\n"; const char* False = "false\n"; struct parameter { char *st1; char *st2; int port; }; int main(int argc, char* argv[]) { char temp[2]; int port = -1; pthread_t tid; Link* pDataItem; int size, tSize, n; const int keysPerCell = 100; time_t aTime; int aKey; if (argc < 2) { cerr << "usage: " << argv[0] << " []\n"; return 1; } sscanf(argv[1], "%d", &port); std::cout << "Enter size of hash table: "; std::cin >> size; do { std::cout << "Enter number of threads: ";

PAGE 115

104 std::cin >> tSize; if (tSize > size) std::cerr << "Error: enter a number <= " << size << std::endl; } while (tSize > size); std::cout << "Enter intial number of items: "; std::cin >> n; theHashTable.init(size, tSize); std::cout << "Going to init hash table\n"; srand(static_cast(time(&aTime))); for(int x = 0; x < n; x++) { do { aKey = rand() % (keysPerCell * size); pDataItem = theHashTable./*find*/FIND(aKey); } while (pDataItem != NULL); pDataItem = new Link(aKey); theHashTable.insert(pDataItem); cout << "element " << (x+1) << " inserted\n"; } sleep(1); theHashTable.displayTable(); std::cout << "Enter number of ports: "; int ports; std::cin >> ports; std::cout << "ready to execute\n"; for(int x = -1; x <= ports; x++) { parameter *param = new parameter; param[0].st1 = argv[1]; param[0].st2 = argv[2]; param[0].port = port + (x * 10); pthread_create(&tid, NULL, &threadRunnable, param); sleep(2); delete param; } while (true) ; } void *threadRunnable(void* arg) { string buf; string buf2; char temp[2]; char ch; int aKey; Link* pDataItem;

PAGE 116

105 SortedList sl; bool result; parameter *param = (parameter *)arg; int port = param[0].port; net::ServerStreamSocket sp(port); std::cout << "Port: " << port << " listening\n"; int back = 0; int count; int oldCount = -1; again: try { sp.accept(); while (true) { sp.read(buf); ch = buf[0]; int a = 1; switch(ch) { case 'c' : int old_value, new_value, increment; sp.write("ack\n"); sp.read(buf); aKey = atoi(buf.c_str()); sp.write("ack\n"); sp.read(buf); count = atoi(buf.c_str()); old_value = theHashTable.dataVal(aKey); buf = itoa(old_value); buf += "\n"; sp.write(buf); sp.read(buf); new_value = atoi(buf.c_str()); increment = new_value old_value; result = theHashTable.modCAS(aKey, old_value, new_value); while (result == false) { old_value = theHashTable.dataVal(aKey); new_value = old_value + increment; result = theHashTable.modCAS( aKey, old_value, new_value); } sp.write("true\n"); break; case 'l' : sp.write("ack\n"); sp.read(buf); aKey = atoi(buf.c_str()); old_value = theHashTable.modLock(aKey); buf = itoa(old_value); buf += " \n"; sp.write(buf);

PAGE 117

106 sp.read(buf); sp.write("ack\n"); new_value = atoi(buf.c_str()); theHashTable.modUnlock(aKey, new_value); break; case 's' : buf2 = itoa(theHashTable.listSize()); buf2 += "\n"; sp.write(buf2); for(int x = 0; x < atoi(buf2.c_str()); x++) { string temp = theHashTable.displayList(x); temp += "\n"; if (sp.bufferSize() < temp.length()) sp.setBuffer(temp.length() + 1); sp.write(temp); } break; case 'i' : sp.write("ack\n"); sp.read(buf); sp.write("ack\n"); aKey = atoi(buf.c_str()); pDataItem= new Link(aKey); theHashTable.insert(pDataItem); sleep(1); break; case 'd' : sp.write("ack\n"); sp.read(buf); sp.write("ack\n"); aKey = atoi(buf.c_str()); theHashTable.remove(aKey); sleep(1); break; case 'f' : sp.write("ack\n"); sp.read(buf); aKey = atoi(buf.c_str()); pDataItem = theHashTable.find(aKey); if (pDataItem != NULL) sp.write(True); else sp.write(False); break; case 'v' : sp.write("ack\n"); sp.read(buf); aKey = atoi(buf.c_str()); pDataItem = theHashTable.find(aKey); if (pDataItem != NULL)

PAGE 118

107 { buf = itoa(pDataItem->data); buf += "\n"; sp.write(buf); } else sp.write("null\n"); break; case 'x' : sp.close(); sp.accept(); break; case 'X' : goto again; } } } catch(net::SocketException& e) { std::cout << e.what() << std::endl; } catch(...) { sp.close(); } }

PAGE 119

108 // tester.java // (C) Copyright 2002 Joseph S. Berrios // // Permission to copy, use, modify, sell and distribute this // software is granted provided this copyright notice appears // in all copies. // This software is provided "as is" without express or // implied warranty, and with no claim as to its suitability // for any purpose. import java.net.*; import java.io.*; import java.lang.*; import java.util.*; // Timer class class Timer { long t; // constructor public Timer() { reset(); } // reset timer public void reset() { t = System.currentTimeMillis(); } // return elapsed time public long elapsed() { return System.currentTimeMillis() t; } // print explanatory string and elapsed time public void print(String s) { System.out.println(s + ": " + elapsed()); } } class threads implements Runnable { int id; String host; int port; int key; int iter; char sync; String result; String buffer; static int tNum; static Semaphore barrier; static Vector waitint; static boolean threadSuspended; long stopTimer;

PAGE 120

109 static Semaphore sem; threads(int x, String h, int p, int k, int i, String s) { id = x; host = h; port = p + (id * 10); key = k; sync = s.charAt(0); iter = i; } threads() { } public void run() { try { execute(); } catch(IOException e) { } } int pow(int a, int b) { int result = a; for(int x = 0; x < b; x++) { result *= a; } return result; } int atoi(String buf) { int value = 0; char ch; int x = 0; do { ch = buf.charAt(x++); } while (((int)ch) == 0); value = ch 48; try { ch = buf.charAt(x++); while (('0' <= (char)ch) && ((char)ch <= '9')) { value *= 10; value += (ch 48); ch = buf.charAt(x++); } } catch(Exception e) { return value; } return value; } void execute() throws IOException {

PAGE 121

110 long init = 0; int SLEEP = 1000; InetAddress addr = InetAddress.getByName(host); Socket socket = new Socket(addr, port); BufferedReader in; PrintWriter out; int a = 1; int x = 0; Timer timer = new Timer(); timer.reset(); again: while (true) { try { in = new BufferedReader( new InputStreamReader( socket.getInputStream())); out = new PrintWriter( new BufferedWriter( new OutputStreamWriter( socket.getOutputStream())), true); if (sync == 'c') { int num = -1; for(; x < iter; x++) { out.println("c"); buffer = in.readLine(); out.println(key); buffer = in.readLine(); out.println(barrier.getCount()); buffer = in.readLine(); int temp = atoi(buffer); temp = temp + 1; out.println(temp); buffer = in.readLine(); } } else if (sync == 'l') { for(; x < iter; x++) { out.println("l"); buffer = in.readLine(); out.println(key); buffer = in.readLine(); int temp = atoi(buffer); temp = temp + 1; out.println(temp); buffer = in.readLine(); } } out.println("v"); buffer = in.readLine(); out.println(key);

PAGE 122

111 buffer = in.readLine(); out.println("X"); } catch(Exception e) { System.out.println(id + " error:" + a); System.out.println(e); try { Thread.sleep(10000); } catch(Exception ee) { } continue again; } finally { socket.close(); barrier.V(); stopTimer = timer.elapsed(); break; } } } } class tester { public static void main(String args[]) throws IOException { if (args.length != 4) { System.out.println("usage java tester " + "([ | ] "); System.exit(0); } InputStreamReader streamIn = new InputStreamReader(System.in); BufferedReader inp = new BufferedReader(streamIn, 1); String buffer; System.out.print("Enter number of threads: "); buffer = inp.readLine(); int max = Integer.parseInt(buffer); threads ts[]; Thread t; ts = new threads[max]; int iter = Integer.parseInt(args[2]); for(int x = 0; x < max; x++) { System.out.print("Enter key that thread[" + x + "] will modify: "); buffer = inp.readLine(); int key = Integer.parseInt(buffer); ts[x] = new threads(x, args[0], Integer.parseInt(args[1]), key, iter, args[3]);

PAGE 123

112 } ts[0].tNum = max; ts[0].barrier = new Semaphore(); ts[0].barrier.putCount(max); ts[0].sem = new Semaphore(); ts[0].threadSuspended = false; for(int x = 0; x < max; x++) { t = new Thread(ts[x]); t.start(); } ts[0].barrier.P(); ts[0].barrier.V(); for(int x = 0; x < max; x++) { System.out.println("Thread[" + (x+1) + "]: " + ts[x].stopTimer); } } }

PAGE 124

LIST OF REFERENCES [AND90] Anderson, T. E.: The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems 1 (1), pp. 6-16, January 1990. [AND91] Andres, G. R.: Concurrent Programming: Principles and Practice. Benjamin/Cummings Publishing Compandy, Redwood City, CA, 1991. [AND89] Andrews, G. R.: A method for solving synchronization problems. Science of Computer Programming, 13 (4), pp. 1-21, December 1989. [AND00] Andrews, G. R.: Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley, Reading, MA, 2000. [ARN00] Arnold, K., Gosling, J., and Holmes, D.: The JavaTM Programming Language, 3rd Edition. Addison-Wesley, Reading, MA, 2000. [BIC88] Bic, L. and Shaw, A. C.: The Logical Design of Operating Systems, second edition. Prentice-Hall, Englewood, NJ, 1988. [BOV01] Bovet, D. P. and Cesati, M.: Understanding the Linux Kernel. O’Reilly & Associates, Sebastopol, CA, 2001. [CAR86] Carriero, N., Gelernter, D., and Liecther, J.: Distributed data structures in Linda. Thirteen ACM Symposium on Principles of Programming Languages (POPL), St. Petersburg Beach, Florida, pp. 236-242, January 1986. [CHA99] Chaudhuri, S., Herlihy, M., and Tuttle, M. R.: Wait-free implementations in message-passing systems. Theoretical Computer Science 220, pp. 211-245, 1999. [DAH70] Dahl, O-J., Myrhaug, B., and Nygaard, K.: SIMULA Common Based Language. Norwegian Computing Center S-22, Oslo, Norway. 1970. [DIJ65] Dijkstra, E. W.: Solution of a problem in concurrent programming control. Communications of the ACM 8 (9), pp. 569, September 1965. [DIJ68A] Dijkstra, E. W.: The structure of “THE” multiprogramming system. Communications of the ACM, 11 (5), pp. 341-346, May 1968. 113

PAGE 125

114 [DIJ68B] Dijkstra, E. W.: Cooperating sequential processes. In F. Genuys, ed., Programming Languages, New York: Academic Press, New York, pp. 43-112, 1968. [DIJ71] Dijkstra, E. W.: Hierarchical ordering of sequential processes. Acta Informatica 1, pp. 115-138, 1971. [DIJ79] Dijkstra, E. W.: A tutorial on the split binary semaphore. EWD 703, Neunen, Netherlands, March 1979. [DIJ80] Dijkstra, E. W.: the superfluity of the general semaphore. EWD 734, Neunen, Netherlands, April 1980. [DIM86] Dimitrovsky, I. A.: A group lock algorithm, with applications. NYU Ultracomputer Note #112, New York University, New York, November 1986. [DIM88] Dimitrovsky, I. A.: ZLISP – A portable Lisp system. Ph.D. Thesis, Courant Institute, New York University, New York, June 1988. [DIM91] Dimitrovsky, I. A.: The group lock and its applications, Journal of Parallel and Distributed Computing 11, pp. 291-302, April 1991. [EDL95] Edler, J.: Practical structures for parallel operating systems. Ph.D. Dissertation, Department of Computer Science, New York University, May 1995. [FAU88] Faulk, S. R., and Parnas, D. L.: On synchronization in hard-real-time systems. Communications of the ACM 31, 3, pp. 274-287, March 1988. [GAR00] Garcia-Molina, H., Ullman, J. D., and Widom, J.: Database System Implementation. Prentice-Hall, Upper Saddle River, NJ, 2000. [GOO89] Goodman, J. R., Vernon, M. K., and Woest, P. J.: Efficient synchronization primitives for large-scale cache-coherent multiprocessors. Proceedings Third International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Boston, Massachusetts, pp. 64-75, April 3-6, 1989. [GRA90] Graunke, G., and Thakkar, S.: Synchronization algorithms for shared memory multiprocessors. IEEE Computer 23 (6), pp. 60-69, June 1990. [HAN70] Hansen, P. B.: The nucleus of a multiprogramming system. Communications of the ACM 13, 4, pp. 238-241, April 1970. [HAN72] Hansen, P. B.: Structured multiprogramming. Communications of the ACM 15, 17, pp. 574-578, July 1972.

PAGE 126

115 [HAN73] Hansen, P. B.: Operating Systems Principles. Prentice-Hall, Englewood Cliffs, NJ, 1973. [HAN75] Hansen, P. B.: The programming language Concurrent Pascal. IEEE Transactions on Software Engineering SE-1 (2), pp. 199-206, June 1975. [HAN77] Hansen, P. B.: The Architecture of Concurrent Programs. Prentice-Hall, Englewood Cliffs, NJ, 1977. [HAR00] Harold, E. R.: Java Network Programming, second edition. O’Reilly & Associates, Sebastopol, CA, 2000. [HAR88] Harrison, M. C.: Add-and-lambda II: Eliminating busy waits. NYU Ultracomputer Note #139, New York University, New York, March 1988. [HER91] Herlily, M.: Wait-free synchronization. ACM Transactions on Programming Languages, 13 (1), pp. 124-149, 1991. [HOA74] Hoare, C. A. R.: Monitors: An operating system structuring concept. Communications of the ACM, 17 (10), pp. 549-557, October 1974. [HOA78] Hoare, C. A. R.: Communicating sequential processes. Communications of the ACM, 21 (8), pp. 666-677, August 1978. [IBM99] International Business Machines: Power PC Assembler Language Reference, IBM Corporation, 1999. http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixassem/alang ref/lwarx.htm . Date accessed: October 2002. [INT97] Intel Corporation: Intel Architecture Software Developer’s Manual – Volume 2: Instruction Set Reference. Intel Corporation, Mt. Prospect, IL, 1997. [ISO95] International Standard Organization: Ada95 Language Reference Manual, International Standard ISO/IEC 8652: 1995(E), Version 6.0 [online], 1995. http://www.adahome.com/rm95 . Date accessed: October 2002. [JEN87] Jensen, E. H., Hagensen, G. W., and Broughton, J. M.: A new approach to exclusive data access in shared memory multiprocessors. Technical Report UCRL-97663, Lawence Livermore National Laboratory, Livermore, CA, November 1987. [JOY86] Joy, W. N., Fabry, R. S., Leffler, S. J., McKusick, M. K., and Karels, M. J.: “Berkeley Software Architecture Manuel, 4.3BSD Edition.” UNIX Programmer’s Supplementary Documents, Volume 1, 4.3 Berkeley Software Distribution, Virtual VAX-11 Version, USENIX Association, Berkeley, California, pp. 6:1-6:43, 1986.

PAGE 127

116 [LAM87] Lamport, L.: A fast mutual exclusion algorithm. ACM Transactions on Computer Systems (TOCS) 5(1), pp. 1-11, February 1987. [LAM80] Lampson, B. W. and Redell, D. D.: Experience with processes and monitors in Mesa.” Communications of the ACM, 23, 2, pp. 105-117, February 1980. [LAU79] Lauer, H. C., and Needham R. M.: On the duality of operating system structures. Proceedings Second International Symposium on Operating Systems. October 1978. Reprinted in Operating Systems Review 13, 2, pp. 3-19, April 1979. [LEA00] Lea, D.: Concurrent Programming in Java, second edition: Design Principles and Patterns. Addison-Wesley, Reading, MA, 2000. [LIN02] Lindsley, R. and Hansen, D.: BKL: One Lock to Bind Them All. Proocedings of the Ottawa Linux Symposium 2002, Ottawa, Ontario, Canada, pp. 301-309, June 26-29, 2002. [MEL91A] Mellor-Cummey, J. M. and Scott, M. L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems (TOCS) 9 (1), pp. 181-197, February 1991. [MEL91B] Mellor-Crummey, J. M., and Scott, M. L.: Scalable reader-writer synchronization for Shared-Memory Multiprocessors. Proceedings Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), Williamsburg, Virginia, pp. 106-113, April 21-24, 1991. [MOR01] Morton, A: http://www.zip.com.au/~akpm/linux/schedlat.html . Date accessed: October 2002. [MOT97] Motorola: PowerPC Microprocessor Family: The Programming Environments for 64-Bit Microprocessors. Motorola, Phoeniz, AZ, 1997. [NEL91] Nelson, C. G.: Systems programming with Modula-3. Prentice Hall, Englewood Cliffs, NJ, 1991. [OUS82] Ousterhout, J. K.: Scheduling techniques for concurrent systems. Proceedings for the Third International Conference on Distributed Computing Systems (ICDCS), Miami/Ft. Lauderdale, Florida, pp. 22-30, October 18-22, 1982. [PAT71] Patil, S. S.: Limitations and capabilities of Dijkstra’s semaphore primitives for coordination among processes. MIT Project MAC Memo 67, Cambridge, MA, February 1971.

PAGE 128

117 [RAJ91] Raj, R. K., Tempero, E., Levy, H. M., Black, A. P., Hutchinson, N. C., and Emerald, E. J.: A general purpose programming language. Software – Practice and Experience, 21 (1), pp. 91-118, January 1991. [REE79] Reed, D. P., and Kanodia, R. K.: Synchronization with event counts and sequencers. Communications of the ACM, 22 (2), pp. 115-123, February 1979. [RUD84] Rudolph, L., and Segall, Z.: Dynamic decentralizedc cache schemes for MIMD parallel processors. Proceedings Eleventh International Symposium on Computer Architecture (ISCA), Ann Arbor, MI, pp. 340-347, June 1984. [SIL01] Silbershatz, A., Korth, H. F., and Sundarshan, S.: Database Systems Concept, 4th Edition. McGraw-Hill, New York, NY, 2001. [SMI81] Smith, B. J.: Architecture and applications of the HEP multiprocessor computer system. Real Time Signal Processing IV, Proceedings of SPIE, pp. 241-248, 1981. [SUN95] Sun Microsystems: UltraSPARC programmer reference manual. Sun Microsystems, Mountain View, CA, 1995. [STE98] Stevens, W. R.: UNIX Network Programming: Networking APIs: Sockets and XTI. Prentice Hall, Englewood Cliffs, NJ, 1998. [STR82] Stroustrup, B.: An experiment with the interchangeability of processes and monitors. Software – Practice and Experience 12, pp. 1011-1025, 1982. [WEL79] Welsh, J., and Bustard, D. W.: Pascal-Plus – another language for modular multiprogramming. Software – Practice and Experience, 9, pp. 945-957, 1979. [WIR77] Wirth, N.: Modula: A language for modular multiprogramming. Software – Practice and Experience, 7, pp. 3-35, 1977. [WIR85] Wirth, N.: Programming in Modula-2, 3rd Edition, corrected, Springer-Verlag, New York, 1985. [WIS94] Wisniewski, R. W., Konothothanassis, L. I., and Scott, M. L.: Scalable spin locks for multiprogrammed Systems. Proceedings Eight International Parallel Processing Symposium (IPPS), Cancn, Mexico, pp. 583-589, April 26-29, 1994. [WOO89] Wood, D.: An evaluation of parallel queues and pools. M. S. Thesis, New York University, New York, January 1989.

PAGE 129

BIOGRAPHICAL SKETCH Joseph Stephen Berros is a native of San Juan, Puerto Rico. After finishing high school he joined the United States Naval Reseve and reported for active duty in 1987. After completing initial active duty for training, he pursued a bachelor’s degree in computer science at the Inter American University of Puerto Rico. After completing his bachelor’s degree, he went for an internship at Argonne National Labs. He relocated to Florida and pursued a master’s degree in computer science at the University of South Florida. In 1996 he started doctoral studies at the University of Florida funded by the Minority Engineering Doctorate Initiative (MEDI) fellowship, being the first student in the department to win this award. In 1996 he accepted a commission in the US Naval Reserve as an Engineering Duty Officer. He served in the staff of Commander, Navy Recruiting Command in Washington, DC, Commander, Naval Air Reserve Force, and Commander, Naval Reserve Force, both in New Orleans, Louisiana. His designator was changed in 1999 to Aerospace Engineering Duty Officer. He attended the Defense Acquisition University and completed courses in defense acquisition management and software acquisition management. Under his new designator, he has served at Patrol Squadron 62 and in Naval Air Depot Jacksonville, both located in Jacksonville, Florida. He owes the successes of his career to the support of a loving family, many friends, a cadre of thoughtful and generous mentors and colleagues, and countless professionals in the Navy, Naval Reserve, and the computing profession with whom he has served. 118