Operating System: Three Easy Pieces --- Locks: Test and Set (Note)

Because disabling interrupts does not work on multiple processors, system designers started to

invent hardware support for locking. The earliest multiprocessor systems, such as the Burroughts

B5000 in the  early 1960‘s, had such support; today all systems provide this type of support, even

for single CPU systems.

The simple bit of hardware support to understand is what is known as a test-and-set instruction,

also known as atomic exchange. To understand how test-and-set works, let‘s first try to build a

simple lock without it. In this failed attempt, we use a simple flag variable to denote whether the

lock is held or not.

In this first attempt, the idea is quite simple: use a simple variale to indicate whether some

thread has possession of a lock. The first thread that enters the critical section will call lock(),

which tests whether the flag is equal to 1 (in this case, it is not), and then sets the flag to 1 to

indicate that the thread now holds the lock. When finished with the critical section, the thread

calls unlock() and clears the flag, thus indicating that the lock is no longer held.

typedef struct __lock_t { int flag; } lock_t;

void init(lock_t* mutex) {
     mutex->flag = 0;
}

void lock(lock_t* mutex) {
    while (mutex->flag == 1)
             ;
    mutex->flag = 1;
}

void unlock(lock_t* mutex) {
    mutex->flag = 0;
}

If another thread happens to call lock() while that first thread is in the critical section, it will

simply spin-wait in the while loop for that thread to call unlock() and clear the flag. Once the first

flag does so, the waiting thread will fall out of the while loop, set the flag to 1 for itself, and

proceed into the critical section.

Unfortunately, the code has two problems: one of correctness, and another of performance. The

correctness problem is simple to see once you get used to thinking about concurrent programming

. Imagine the code interleaving; assume flag = 0 to being.

As you can see from this interleaving, with timely (untimely?) interrupts, we can easily produce

a case where both threads set the flag to 1 and both threads are thus able to enter the critical

section. This behavior is what professionals call "bad" - we have obviously failed to provide the

most basic requirement: providing mutual exclusion.

The performance problem, which we will address more later on, is the fact that the way a thread

waits to acquire a lock that is already held: it endlessly checks the value of flag, a technique

known as spin-waiting. Spin-waiting wastes time waiting for another thread to release a lock. The

waste is exceptionally high on a uniprocessor, where the thread that the waiter is waiting for

cannot even run (at least, until a context switch occurs!) Thus, as we move forward and develop

more sophisticated solutions, we should also consider ways to avoid this kind of waste.

                    Building A Working Spin Lock

While the idea behind the example above is a good one, it is not possible to implement without

some support from the hardware. Fortunately, some systems provide an instruction to support

the creation of simple based one this concepty. This more powerful instruction has different

names -- on SPARC, it is load/store unsigned byte instruction (ldstub), whereas on x86, it is the

atomic exchange instruction (xchg) -- but basically does the same thing across platforms, and is

generally referred to as test-and-set. We define what the test-and-set instruction does with the

following C code snippet:

int TestAndSet(int* old_ptr, int new) {
     int old = *old_ptr;
     *old_ptr = new;
     return old;
}

What the test-and-set instruction does is as follows. It returns the old value pointed to by the ptr,

and simultaneously updates said value to new. The key, of course, is that this sequence of 

operations is performed atomically. The reason it is called test-and-set is that it enables you to

test the old value (which is what is returned) while simultaneouly setting the memory location to

a new value; as it turns out, this slightly more powerful instruction is enough to build a simple

spin lock, as we now examine in figure 28.3. Or better yet: figure it out first yourself!

Let‘s make sure we understand why this lock works. Imagine first the case where a thread calls

lock() and no other thread currently holds the lock; thus, flag should be 0. When the thread calls

TestAndSet(flag, 1), the routine will return the old value of flag, which is 0; thus, the calling

thread, which is testing the value of flag, will not get caught spinning in the while loop and will

acquire the lock. The thread will also atomically set the value to 1, thus indicating that the lock

is now held. When the thread is finished with its critical section, it calls unlock() to set the flag

back to zero.

typedef struct __lock_t {
    int flag;
}

void init(lock_t* lock) {
    lock->flag = 0;
}

void lock(lock_t* lock) {
    while (TestAndSet(&lock->flag, 1) == 1)
          ;
}

void unlock(lock_t* lock) {
    lock->flag = 0;
}

The second case we can imagine arises when one thread already has the lock held (i.e., flag is 1).

In this case, this thread will call  lock() and then call TestAndSet(flag, 1) as well. This time,

() will return the old value at flag, which is 1 (because the lock is held), while simultaneouly

setting it to 1 again. As long as the lock is held by another thread, TestAndSet() will repeatedly

return 1, and thus this thread will spin and spin until the lock is finally released. When the flag is

finally set to 0 by some other thread, this thread will call TestAndSet() again, which will now

return 0 while atomically setting the value to 1 and thus acquire the lock and enter the critical

section.

By making both the test of the old lock value and set of the new value a single atomic operation,

we ensure that only one thread acquires the lock. And thst‘s how to build a working mutual

exclusion primitive!

You may also now understand why this type of lock is usually referred to as a spin lock. It is the

simplest type of lock to build, and simply spins using CPU cycles, until the lock becomes available.

To work corectly on a single processor, it requires a preemptive scheduler (i.e., one that will

interrupt a thread via a timer, in order to run a different thread, from time to time). Without

preemption, spin locks don‘t make much sense on a single CPU, as a thread spinning on a CPU

will never relinquish it.

                  TIPs: Think About Concurrent As Malicious Scheduler

From this example, you might get a sense of the approach you need to take to understand

concurrent execution. What you should try to do is to pretend you are a malicious scheduler, one

that interrupts threads at the most inopportune of times in order to foil their feeble attempts at

building synchronization promitives. What a mean scheduler you are! Although the exact sequence

of interrupts may be improbable, it is possible, and that is all we need to demonstrate that a

particular approach does not work. It can be useful to think maliciouly! (At least, sometimes.)

时间: 2024-10-24 20:51:44

Operating System: Three Easy Pieces --- Locks: Test and Set (Note)的相关文章

Operating System: Three Easy Pieces --- Locks: Pthread Locks (Note)

The name that the POSIX library uses for a lock is mutex, as it is used to provide mutual exclusion between threads, i.e., if one thread is in the critical sections, it excludes the others from entering until it has completed the section. Thus, when

Operating System: Three Easy Pieces --- Mechanism: Limited Direct Execution (Note)

In order to virtualize the CPU, the operating system needs to somehow share the physical CPU among many jobs  running seemingly at the same time. The basic idea is simple: run one process for a little while, then run another, and so forth. By time sh

Operating System: Three Easy Pieces --- Locks (Note)

From the introduction to concurrency, we saw one of the fundamental problems in concurrent programming: we would like to execute a series of instructions atomically, but due to the presence of interrupts on a single processor (multiple threads execut

Operating System: Three Easy Pieces --- Beyond Physical Memory: Mechanisms (Note)

Thus far, we have assumed that an address space is unrealistically small and fits into the physical memory. In fact, we have been assuming that every address space of ervery running process fits into memory. We will now relax these big assumptions, a

Operating System: Three Easy Pieces --- Lock Concurrent Data Structures (Note)

Before moving beyong locks, we will first describe how to use locks in some common data structures. Adding locks to a data structure to make it usable by threads makes the structure thread safe. Of course, exactly how such locks are added determines

Operating System: Three Easy Pieces --- Process (Note)

1. How can the operating system provide the illusion of a nearly-endless supply of said CPUs? The OS creates this illusion by virtualizing the CPU. The basic technique, known as the time- sharing the CPU, allows users to run as many concurrent proces

Operating System: Three Easy Pieces --- Limited Directed Execution (Note)

In order to virtualize the CPU, the operating system needs to somehow share the physical CPU among many jobs running seemingly at the same time. The basic idea is simple: run one process for a little while, then run another one, and so forth. By time

Operating System: Three Easy Pieces --- Pthread Locks (Note)

The name that the POSIX library uses for a lock is a mutex, as it is used to provide mutual exclusion between threads, i.e., if one thread is in the critical section, it excludes the others from entering until it has completed the section. Thus, when

Operating System: Three Easy Pieces --- Thread API (Note)

This chapter briefly covers the main properties of the thread API. Each part will be explained further in the subsequent chapters, as we know how to use the API. More details can be found in various books and online sources. We should note that the s