Linux Kernel Deadlock in Interrupt Context Due to Improper Spinlock Usage

Introduction

This post explores a rare but critical deadlock scenario in the Linux kernel where improper spinlock usage in interrupt context leads to system instability. The issue arises from nested spinlock acquisition, causing a circular dependency that the kernel cannot resolve, resulting in a complete system freeze.

Symptoms

System Hangs and Unresponsive Behavior

The system becomes unresponsive, with no visible kernel messages (KMSG) or user-space activity. The only indication is a kernel panic or a “BUG: spinlock lockup” message in the logs.

High CPU Utilization

One CPU core may show 100% utilization, with the process stuck in a spinlock state, as observed via top or htop.

Kernel Logs and Stack Traces

Dmesg output might show:

[  123.456789] BUG: spinlock lockup in interrupt context: <lock> (some lock class), at <function>

Root Cause Analysis

Spinlock Usage in Interrupt Context

Spinlocks are designed for short, non-preemptive critical sections. In interrupt context, they must not be held while enabling interrupts or acquiring another spinlock. Nested acquisition of spinlocks in interrupt handlers creates a circular dependency, leading to a deadlock where the kernel cannot proceed.

Example Code Vulnerability

A hypothetical driver might have the following problematic code:

void irq_handler(void) {
    spin_lock(&lock1);
    // Risky code that calls a function with spin_lock(&lock2)
    spin_unlock(&lock1);
}

If lock2 is already held by the same thread in a non-interrupt context, the kernel will deadlock.

Diagnosis Tools

Kernel Debugging Tools

Use gdb with the kernel’s vmlinux file to inspect the stack trace of the hung process:

(gdb) thread 1
(gdb) bt
#0  spin_lock_irqsave () at include/linux/spinlock_api.h:123
#1  some_function () at drivers/some_driver.c:45

The lockdep mechanism can also detect lock dependencies at runtime:

echo 1 > /proc/sys/kernel/lockdep

System Monitoring Utilities

perf and ftrace help trace lock acquisition patterns:

perf record -e sched:sched_switch,spinlock:*

Step-by-Step Solution

1. Identify the Locking Pattern

Use cat /proc/locks to list active locks and their owners. Cross-reference with dmesg for deadlock messages.

2. Modify Spinlock Usage

Replace nested spinlocks with non-blocking mechanisms or restructure code to avoid contention. For example:

spin_lock_irqsave(&lock1, flags);
spin_unlock_irqrestore(&lock1, flags);
// Avoid acquiring another spinlock here

3. Use Mutexes for Long-Running Locks

Replace spinlocks with mutex_lock() in scenarios where the critical section is long or involves sleep operations.

4. Test with Lockdep

Enable CONFIG_LOCKDEP in the kernel config and reproduce the issue. Lockdep will log a detailed stack trace of the deadlock.

5. Validate with Kernel Modules

Unload and reload the offending module after applying fixes. Use modinfo and modprobe to verify changes.