Understanding and Resolving Kernel-Level Deadlocks in Linux Due to Semaphore Contention

Kernel-Level Deadlocks: A Deep Dive into Semaphore Contention on Linux

Symptoms of the Issue

System administrators or kernel developers may encounter the following symptoms:

  • Processes or threads become unresponsive, appearing in the “D” (uninterruptible sleep) state in the output of ps -elF.
  • High CPU utilization in the kernel space without corresponding user-space activity.
  • Kernel logs (dmesg) showing messages like kernel: INFO: task [process_name] blocked for more than 120 seconds.
  • Repeating “semaphore: deadlock” or “spinlock: recursion” warnings in kernel traces.
  • Application-level errors like “Operation not permitted” or “Resource deadlock avoided” in system calls.

Root Cause: Circular Semaphore Dependency

Deadlocks in the Linux kernel often arise from circular dependencies in semaphore acquisition. For example, two threads may hold a semaphore and wait for another, creating a cycle that prevents resolution. This typically occurs in multithreaded applications or kernel modules that improperly manage synchronization primitives. Key factors include:

  • Failure to acquire semaphores in a consistent global order.
  • Improper use of down() or down_interruptible() in the kernel, leading to indefinite blocking.
  • Overlapping critical sections or nested semaphore locks without proper release mechanisms.

Example Code Triggering the Deadlock

The following C code demonstrates a common deadlock pattern in a kernel module:

void thread_a(struct semaphore *sem1, struct semaphore *sem2) {
    down(sem1);
    down(sem2);
    // Critical section
    up(sem2);
    up(sem1);
}
void thread_b(struct semaphore *sem1, struct semaphore *sem2) {
    down(sem2);
    down(sem1);
    // Critical section
    up(sem1);
    up(sem2);
}

When thread_a and thread_b execute concurrently, they may lock semaphores in opposite orders, leading to a deadlock.

Diagnosis Tools and Techniques

Use the following tools to identify and troubleshoot semaphore deadlocks:

1. dmesg for Kernel Logs

Check for blocked process messages:

$ dmesg | grep -i 'blocked'

Look for timestamps and process IDs (PIDs) of blocked tasks.

2. ps and top

Identify processes in the “D” state:

$ ps -elF | grep D

Use top to monitor CPU usage for anomalies in kernel threads.

3. perf for Stack Tracing

Trace kernel function calls to locate blocking points:

$ perf record -a -g -s sleep 10
$ perf report

Look for functions like down(), schedule(), or spin_lock() in the call stack.

4. blktty or sysrq for Kernel Inspection

Trigger a SysRq key combination (echo t > /proc/sysrq-trigger) to generate a stack trace of all threads. Analyze the output for threads waiting on semaphores.

5. kprobe and tracepoint for Dynamic Tracing

Use SystemTap or eBPF to trace semaphore operations:

probe kernel.function("down") {
    printf("Thread %d acquired semaphore %p\n", pid(), $sem);
}

This helps track the order and timing of semaphore acquisitions.

Step-by-Step Solution

To resolve a semaphore deadlock, follow these steps:

  1. Identify Blocked Processes: Use ps -elF to find PIDs of processes in the “D” state. Cross-reference with dmesg for detailed logs.
  2. Trace the Call Stack: Execute echo t > /proc/sysrq-trigger to capture a stack dump. Search for down(), schedule(), or spin_lock() entries.
  3. Analyze with perf: Record and analyze kernel events to pinpoint where threads are waiting on semaphores. Compare the call graphs of both threads.
  4. Review Code for Circular Dependencies: Examine the application or kernel module for inconsistent semaphore acquisition order. Ensure all threads follow a global ordering strategy (e.g., always acquire sem1 before sem2).
  5. Modify the Code: Refactor the logic to enforce a strict acquisition order. Example fix:
    void thread_a(struct semaphore *sem1, struct semaphore *sem2) {
        down(sem1);
        down(sem2);
        // Critical section
        up(sem2);
        up(sem1);
    }
    void thread_b(struct semaphore *sem1, struct semaphore *sem2) {
        down(sem1); // Enforce same order as thread_a
        down(sem2);
        // Critical section
        up(sem2);
        up(sem1);
    }
    
  6. Test with Timeouts: Replace down() with down_timeout() to avoid indefinite blocking in case of errors.
  7. Rebuild and Re-deploy: Recompile the kernel module or application and monitor with dmesg and ps to confirm resolution.
  8. Implement Lock Ordering Policies: Use tools like lockdep (Linux kernel configuration: CONFIG_LOCKDEP) to detect potential deadlocks during runtime.
Scroll to Top