Introduction
This post explores a rare but critical deadlock scenario in the Linux kernel where improper spinlock usage in interrupt context leads to system instability. The issue arises from nested spinlock acquisition, causing a circular dependency that the kernel cannot resolve, resulting in a complete system freeze.
Symptoms
System Hangs and Unresponsive Behavior
The system becomes unresponsive, with no visible kernel messages (KMSG) or user-space activity. The only indication is a kernel panic or a “BUG: spinlock lockup” message in the logs.
High CPU Utilization
One CPU core may show 100% utilization, with the process stuck in a spinlock state, as observed via top
or htop
.
Kernel Logs and Stack Traces
Dmesg output might show:
[ 123.456789] BUG: spinlock lockup in interrupt context: <lock> (some lock class), at <function>
Root Cause Analysis
Spinlock Usage in Interrupt Context
Spinlocks are designed for short, non-preemptive critical sections. In interrupt context, they must not be held while enabling interrupts or acquiring another spinlock. Nested acquisition of spinlocks in interrupt handlers creates a circular dependency, leading to a deadlock where the kernel cannot proceed.
Example Code Vulnerability
A hypothetical driver might have the following problematic code:
void irq_handler(void) {
spin_lock(&lock1);
// Risky code that calls a function with spin_lock(&lock2)
spin_unlock(&lock1);
}
If lock2
is already held by the same thread in a non-interrupt context, the kernel will deadlock.
Diagnosis Tools
Kernel Debugging Tools
Use gdb
with the kernel’s vmlinux file to inspect the stack trace of the hung process:
(gdb) thread 1
(gdb) bt
#0 spin_lock_irqsave () at include/linux/spinlock_api.h:123
#1 some_function () at drivers/some_driver.c:45
The lockdep
mechanism can also detect lock dependencies at runtime:
echo 1 > /proc/sys/kernel/lockdep
System Monitoring Utilities
perf
and ftrace
help trace lock acquisition patterns:
perf record -e sched:sched_switch,spinlock:*
Step-by-Step Solution
1. Identify the Locking Pattern
Use cat /proc/locks
to list active locks and their owners. Cross-reference with dmesg
for deadlock messages.
2. Modify Spinlock Usage
Replace nested spinlocks with non-blocking mechanisms or restructure code to avoid contention. For example:
spin_lock_irqsave(&lock1, flags);
spin_unlock_irqrestore(&lock1, flags);
// Avoid acquiring another spinlock here
3. Use Mutexes for Long-Running Locks
Replace spinlocks with mutex_lock()
in scenarios where the critical section is long or involves sleep operations.
4. Test with Lockdep
Enable CONFIG_LOCKDEP
in the kernel config and reproduce the issue. Lockdep will log a detailed stack trace of the deadlock.
5. Validate with Kernel Modules
Unload and reload the offending module after applying fixes. Use modinfo
and modprobe
to verify changes.