There is a coding bug where a piece of code tries to grab a spinlock twice: it will spin forever, waiting for the lock to be released (spinlocks, rwlocks and semaphores are not recursive in Linux). This is trivial to diagnose: not a stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem.
For a slightly more complex case, imagine you have a region
shared by a softirq and user context. If you use a
spin_lock()
call to protect it, it is
possible that the user context will be interrupted by the softirq
while it holds the lock, and the softirq will then spin
forever trying to get the same lock.
Both of these are called deadlock, and as shown above, it can occur even with a single CPU (although not on UP compiles, since spinlocks vanish on kernel compiles with CONFIG_SMP=n. You'll still get data corruption in the second example).
This complete lockup is easy to diagnose: on SMP boxes the watchdog timer or compiling with DEBUG_SPINLOCKS set (include/linux/spinlock.h) will show this up immediately when it happens.
A more complex problem is the so-called 'deadly embrace', involving two or more locks. Say you have a hash table: each entry in the table is a spinlock, and a chain of hashed objects. Inside a softirq handler, you sometimes want to alter an object from one place in the hash to another: you grab the spinlock of the old hash chain and the spinlock of the new hash chain, and delete the object from the old one, and insert it in the new one.
There are two problems here. First, if your code ever tries to move the object to the same chain, it will deadlock with itself as it tries to lock it twice. Secondly, if the same softirq on another CPU is trying to move another object in the reverse direction, the following could happen:
Table 7-1. Consequences
CPU 1 | CPU 2 |
---|---|
Grab lock A -> OK | Grab lock B -> OK |
Grab lock B -> spin | Grab lock A -> spin |
The two CPUs will spin forever, waiting for the other to give up their lock. It will look, smell, and feel like a crash.