Ostep-Locks

Locks

By now we have a good idea of why we need locks: to ensure that only one thread is in a given critical section at a time. Hence why the POSIX library refers to locks as mutexes, as they enforce mutual exclusion among threads. Now we'll look at their implementation.

Evaluation

Basic evaluation criteria for locks:

  • mutual exclusion: does this lock work? does it prevent multiple threads from entering a critical section?
  • fairness: does each thread requesting a lock have a fair shot? do any threads starve, i.e. never obtain it?
  • performance: what are the performance consequences for various situations: single thread with no contention, multiple thread in contention on single CPU, multiple threads in contention across multiple CPUs?

Implementations

First thing to note: it is not possible to do this without help from the OS or hardware. Without new primitives for locking, your instruction sets will always be at the mercy of an inopportune context switch.

Spin locks with Test-And-Set

The hardware can offer a test-and-set instruction (aka atomic exchange). The exchange looks like test_and_set(int *old_ptr, int new_value), and it replaces the old value with the new one, while returning the old value. A simple spin lock would look like:

typedef struct __lock_t {
	int flag;
} lock_t;

void lock(lock_t *lock) {
	// when the flag gets unlocked, this check will return 0 and stop spinning
	while (test_and_set(&lock->flag, 1) == 1)
		; // spin, do nothing
}

void unlock(lock_t *lock) {
	lock->flag = 0;
}
  • Requires a preemptive scheduler (otherwise the spinning lock might never release).
  • However, spinning wastes a ton of CPU cycles (especially when single threaded!) and should be avoided.
  • Evaluation:
    • Correct: yes
    • Fairness: no guarantees
    • Performance: spinning is not great! (although it's not that bad in a multi-threaded env)

Spin locks with Compare-And-Swap

This instruction looks like int compare_and_swap(int *ptr, int expected, int new_value); if the current ptr value is equal to the one that is expected, then it is updated to the new_value; in either case, the old actual value is returned.

void lock(lock_t *lock) {
	// only set flag to locked (1) if the flag is unlocked (set to 0).
	// if the old actual value is 1, then it was still locked, so spin.
	while (compare_and_set(&lock->flag, 0, 1) == 1)
		; // spin, do nothing
}

Ticket locks with Fetch-And-Add

The int fetch_and_add(int *ptr) instruction simply increments the value at *ptr while returning its old value before the increment.

typedef struct __lock_t {
	// the current ticket counter
	// (just like taking a ticket when waiting in line)
	int ticket;
	// whose turn it currently is
	int turn;
} lock_t;

void lock(lock_t *lock) {
	int myturn = fetch_and_add(&lock->ticket);
	while (lock->turn != myturn) {
		;
	}
}

void unlock(lock_t *lock) {
	lock->turn = lock->turn + 1;
}

This method clearly has much better fairness

Avoiding spinning

So we've been able to implement correct locks with new hardware instructions. But we need help from the OS if we want to avoid spinning and have more performant locks.

For example in Solaris we have syscalls park() to park the current thread and unpark(thread_id) to wake a given thread. A more sophisticated lock can be implemented by also storing a queue of thread ids. When a thread requests a lock, instead of spinning, it can add its own identifier to the queue and park itself. Then when unlocking, we can check the queue for the next thread id and unpark it. (See Fig 28.9 in OSTEP).

In Linux we have a futex which contains a queue & physical mem address. There are two relevant syscalls

  • futex_wait(address, expected) puts the calling thread to sleep if value at address is equal to expected
  • futex_wake(address) wakes one thread from the futex's queue

Concurrent data structures

Takeaway from Ch 29: it's pretty trivial to make data structures thread safe: just wrap each operation in a lock and unlock, making a big critical section. However, many times this will not be performant, and so it usually makes sense to mimize the critical sections and just wrap the parts that truly are critical, allowing more concurrency.