CIS 307: TOPIC: Machine Instructions for Concurrency. Spinlocks
These topics are not treated in Tanenbaum (only TestAndSet is)
[TestAndSet],
[Swap and Fetch_and_Store],
[Compare and Swap],
[Non-Blocking Concurrent Queue],
[Yield],
[Spinlocks]
We consider some machine instructions that support concurrency.
TestAndSet and Swap are used (one or the other) for implementing spinlocks
when more than one processor is used (hence interrupt masking is not
sufficient). These two instructions, and all other instructions used for
implementing spinlocks, do some reads and some writes to memory in a single
atomic action.
Yield is used to support simple forms of context switching.
Always be conscious of the hardware support for Operating Systems functions.
There are a number of reasons for this support. It may be the only way to
solve the problem and,
always, it allows solutions that are
more efficient than purely software solutions.
TestAndSet
We can think of the TestAndSet instruction as a function implemented atomically
in hardware. Here is this function:
int TestAndSet(int *x){
int temp = *x;
*x = 1;
return temp;
end;
Swap and Fetch_and_Store
The Swap instruction can also be thought of as a procedure implemented
atomically in hardware.
Here is this procedure:
procedure Swap (a : in out integer; b : in out integer);
Local Variable temp : integer;
begin
temp := a;
a := b;
b := temp;
end;
It is easy to see that Swap can be used to implement TestAndSet. [Just store
1 in b, swap, and return b.]
Swap can be implemented in terms of
TestAndSet but it is more difficult [you need to create a critical region].
A similar atomic machine instruction is fetch_and_store:
function Fetch_and_Store (a : in out integer; b : in integer): integer;
Local Variable temp : integer;
begin
temp := a;
a := b;
return temp;
end;
Compare and Swap
A more powerful atomic instruction is
Compare-and-Swap. This is not
available on most computers but it may be implemented in terms of other
atomic instructions. The instruction takes three arguments. The first is an
address, the second and third are integer values, called respectively the
old and the new value. The istruction
atomically does the following:
function Compare_and_Swap(addr: ^integer; old: integer; new: integer): boolean
compare the value at addr to old
if they are equal then
set value at addr to new and return success
else
return failure
This instruction has been used to create protected, non blocking objects
(for example queues, as shown in the next section),
that is objects that can be modified atomically and without forcing a
process to wait [either busy-wait or non-busy-wait].
Non-Blocking Concurrent Queue
The following example of non-blocking algorithms is due to Michael and Scott.
It specifies a queue that can be safely used by concurrent proceses
without the need of any spinlock or semaphore. In the following CAS
stands for CompareAndSwap.
structure pointer_t {ptr: pointer to node_t, count: unsigned integer}
structure node_t {value: data type, next: pointer_t}
structure queue_t {Head: pointer_t, Tail: pointer:t}
initialize(Q: pointer to queue_t)
node = new node() // Allocate a free node
node->next.ptr = NULL // Make it the only node in the linked list
Q->Head = Q->Tail = node // Both Head and Tail point to it
enqueue(Q: pointer to queue_t, value: data type)
node = new node() // Allocate a new node from the free list
node->value = value // Copy enqueued value into node
node->next.ptr = NULL // Set next pointer of node to NULL
loop // Keep trying until enqueue is done
tail = Q->tail // Read Tail.ptr and Tail.count together
next = tail.ptr->next // Read next ptr and count fields together
if tail == Q->Tail // Are tail and next consistent?
if next.ptr == NULL // Was Tail pointing to the last node?
if CAS(&tail.ptr->next,next,[node,next.count+1]) // try to link
// node at the end of the linked list
break // Enqueue is done, exit loop
endif
else
CAS(&Q->Tail,tail,[next.ptr,tail.count+1]) // Try to swing
// Tail to the next node
endif
endloop
CAS(&Q->Tail, tail, [node,tail.count+1]) // Enqueue is done. Try to
// swing tail to the inserted node.
dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean
loop //Keep trying until dequeue is done
head = Q->Head //Read Head
tail = Q->Tail //Read Tail
next = head->next //Read Head.ptr->next
if head == Q->Head //Are head, tail, and next consistent?
if head.ptr == tail.ptr //Is queue empty or Tail falling behind?
if next.ptr == NULL //Is queue empty?
return FALSE //Queue is empty, could't dequeue
endif
CAS(&Q->Tail,tail,[next.ptr,tail.count+1]) //Tail is falling behind.
//Try to advance it
else //No need to deal with Tail
*pvalue = next.ptr->value//Read value before CAS, otherwise
//another dequeue might free the next node
if CAS(&Q->Head,head,[next.ptr,head.count+1]) //Try to swing Head
//to the next node
break //Dequeue is done. Exit loop
endif
endif
endloop
free(head.ptr) //It is safe now to free the old dummy node
return TRUE //Queue wasnot empty, dequeue succeeded
Yield
Also the Yield instruction can be thought of as a procedure implemented in
hardware.
procedure Yield (a : address; b : address);
begin
MEMORY[a] := ProgramCounter;
ProgramCounter := MEMORY[b];
end;
Yield is used to switch from one process to another. Let's represent with P^
the location where in a process P we save its PC at the time of the switch,
and let's represent with * the location P^ of the current process P.
Then the switch from process P to process Q can be effected with
- Yield(P^, Q^); or just
- Yield(*, Q^);
It is easy to use Yield to implement Co-routine structures. For example,
the "RESUME Q^" statement is replaced simply by "Yield *, Q^".
Equally easy is to implement a non-preemptive scheduler. Say that the
location for the scheduler S is S^ and S manages the ready queue. Then a
process gives up control with "Yield *, S^" and S will dispatch the selected
ready process R with "yield *, R^".
Of course we have simplified our discussion by not worrying about the hardware
context of processes (not even about the content of the General Purpose
Registers). But of course, since we are not in a preemptive situation, it is easy
for the processes to save and restore the appropriate information when
using yield.
SpinLocks
Spinlocks are abstract data types, or classes, that support
the Busy-Wait solution of the Mutual Exclusion Problem.
The operations on spinlocks are InitLock, Lock, and UnLock.
If mutex is a spinlock that has been initialized using InitLock,
then Critical Regions S1, S2, .., Sn can be protected from each other
by preceding each region by a call to Lock and following it by a call to
UnLock.
So the region Si becomes:
Lock(mutex);
Si;
UnLock(mutex);
The pseudo code for implementing SpinLocks follows:
TYPE SpinLock is boolean;
PROCEDURE InitLock(L : in out SpinLock);
BEGIN
L := false;
END;
PROCEDURE Lock(L : in out SpinLock);
BEGIN
WHILE TestAndSet(L) DO null;
END;
PROCEDURE UnLock(L : in out SpinLock);
BEGIN
L := false;
END;
As you can see, the implementation is quite simple and with no system
overhead. It works because of the
properties of the TestAndSet instruction.
Notice that this solution involves Busy Waiting.
A big problem of the spinlock is that by repeatedly testandsetting
a location in memory it can use a high percentage of the memory
bandwidth, thus reducing the utilization of other procesors sharing the memory.
This location that is repeatedly accessed, is called
a Hot Spot in the memory.
A partial solution (in the case of cache-coherent shared memories)
is to change the Lock operation as follows:
PROCEDURE Lock(L : in out SpinLock);
BEGIN
WHILE TestAndSet(L) DO
WHILE (L == 1) DO null;
END;
which uses less of the memory bandwidth since the TestAndSet instruction
involves both a read and a write transaction on the memory while
(L=1) involves only a read operation, and this read operation is normally
carried out on a cached copy. [The While Loop operates on the cached
copy without affecting the shared memory. It assumes that the hardware
supports some form of cache coherence algorithm which
invalidates the cache copy when the shared memory copy is updated.]
Mellor-Crummey and Scott have defined a spinlock that does not require
the presence of a coherent cache, yet it does not provoke a hot spot:
type qnode = record
next : ^qnode // ptr to successor in queue
locked : Boolean // busy-waiting necessary
type lock = ^qnode; // ptr to tail of queue
// I points to a queue link record allocated in shared memory
// and locally-accessible and specific to the invoking processor.
procedure acquire_lock(L: ^lock; I: ^qnode)
var pred : ^qnode;
I->next := nil; // Initially no successor
pred := fetch_and_store(L,I); // queue for lock
if pred != nil // lock was not free
I->locked := true; // prepare to spin
pred->next := I; // link behind predecessor
repeat while I->locked; // busy-wait for lock
procedure release_lock(L: ^lock; I: ^qnode);
if I->next = nil // no known successor
if compare_and_swap(L, I, nil)
return // no successor, lock free
repeat while I->next = nil; // wait for successor
I->next->locked ;= false; // pass lock
There is no guaranty of Fairness for spinlocks.
That is, if two processes P and Q are trying to enter a critical
region protected by a locked spinlock mutex, then there is no guaranty
that they will be serviced in FIFO order. Their order of entry will be random.
More serious is the problem that the process P may use the spinlock within a
loop and use the lock repeatedly while Q is unable to acquire it.
[How can this happen?] This behavior could be avoided by giving a "Nice"
behavior to the UnLock operation: when the Unlock is executed,
a Yield instruction
is used to check if another process is waiting and if yes, to defer to it.
Of course, in most situation this is an unwarranted precaution.
Spinlocks also are dangerous to use because of the Priority Inversion Problem.
See the discussion of this issue in Tanenbaum, page 39.
Two possible solutions to the priority inversion problem:
- Determine the maximum priority
among the processes that can
execute these critical regions, then make sure that
all processes execute the critical regions at this priority. Since now
competing processes have the same priority, they cannot preempt each other.
- Execute the critical regions with masked interrupts. Thus the processor
of the current process cannot be preempted.
You may see the danger of the Priority Inversion Problem in the
following note about the Mars Rover.
A related problem is the formation of scheduling convoys.
This occurs when a process ceases to run in the middle of a
critical region [because of page fault, quantum expiration, external interrupt]
and then all other processes sharing this region have to wait for its
completion. A number of techniques are being introduced to deal with this
problem, in particular Preemption-safe Locking and
Non-blocking Algorithms.
[Preemption-safe locking tries to avoid preemption during a critical region
by having the kernel extend the quantum of a process in the critical region
or by increasing its priority. Non-blocking algorithms, expecially for
implementing shared counters and queues, make sure that if a process blocks
within a critical region, it can be brought back to the beginning of the
critical region and another process is allowed to enter the region. An example
of this technique, for queues, was seen earlier in these notes.]
Of course all
blocking locks [like spinlocks and semaphores] can give origin to deadlocks.
In general spinlocks with their busy waiting mechanism are desirable
(as compared to non busy waiting mechanisms like semaphores) when the
scheduling and context switching time are greater than the time required to
execute the critical region, or when there are processors not required
to do other tasks.
The efficient implementation of spinlocks is a great concern because spinlocks
are the usual foundation for synchronization, expecially in multiprocessor
systems.
They are used everywhere and very frequently. As the use of multiprocessor
systems incrases so will the interest in efficient spinlocks and in alternative
synchronization mechanisms
ingargiola.cis.temple.edu