Linux Kernel Deadlock in sysfs Filesystem Operations During Module Unloading

Overview

This post details a complex Linux kernel deadlock scenario triggered during sysfs filesystem operations, specifically when unloading kernel modules. The issue manifests as system instability, with processes hanging indefinitely due to conflicting lock acquisition orders.

Symptoms

System administrators may notice the following symptoms:

  • The system becomes unresponsive, with high CPU utilization on a single core but no meaningful work being processed.
  • Kernel logs (dmesg) show deadlock messages like “Detected deadlock on cpu 0” or “Potential lock dependency issues.”
  • Attempting to unload a kernel module using rmmod results in a timeout or fails with Operation not permitted.
  • Processes involved in sysfs operations (e.g., udevd or user-space tools interacting with /sys) become stuck in D (uninterruptible sleep) state.

Indicators in Kernel Logs

Kernel logs often contain entries like:

[ 1234.567890] deadlock: potential deadlocks on cpu 0:  
[ 1234.567891] kobject: 4321: sysfs_remove_file failed.  
[ 1234.567892] lockdep: circular locking dependencies detected.  
[ 1234.567893] Mutex at 0xffff880000001234 (sysfs_mutex):  
[ 1234.567894]  (0) -> acquire at 0xffff88000000abcd (another_mutex)  
[ 1234.567895]  (1) -> acquire at 0xffff880000001234 (sysfs_mutex)  

Such logs indicate that a module’s cleanup routine is attempting to release a sysfs resource while another thread holds a lock that the module is waiting for.

Root Cause Analysis

The deadlock arises from improper synchronization between kernel module operations and sysfs filesystem interactions. Specifically, when a module unloads, it may attempt to delete sysfs attributes or directories while another thread (e.g., a user-space process or another kernel thread) is accessing those same sysfs objects, leading to a circular lock dependency.

Locking Mechanism Misuse

The sysfs subsystem uses a global sysfs_mutex to serialize concurrent accesses. If a module’s exit() function calls sysfs_remove_file() without ensuring that all sysfs references are released, it may wait indefinitely for the mutex while another thread holds it. For example, a module might:

static struct kobject *my_kobj;
static ssize_t my_attr_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) {
// Access sysfs data
}
module_exit(my_exit) {
sysfs_remove_file(my_kobj, &my_attr.attr);
}

If my_attr_show is invoked concurrently during the unload, the module’s exit routine may deadlock on the sysfs_mutex.

Kernel API Misuse

The sysfs_remove_file() function requires that the sysfs entry is no longer in use. If the module fails to decrement the reference count (e.g., via kobject_put()) before removal, the sysfs subsystem may not release the resource, causing a deadlock during cleanup.

Diagnosis Tools

System administrators can use the following tools to identify and troubleshoot this issue:

  • dmesg: To capture kernel log messages indicating deadlocks or lock dependencies.
  • lockdep: Enabled via CONFIG_LOCKDEP, it provides static analysis of lock dependencies and detects potential deadlocks at runtime.
  • perf: To trace system calls and kernel functions involved in sysfs operations.
  • ltrace or strace: To monitor user-space calls to sysfs, such as read() or write() on /sys/class/ entries.
  • /proc/sys/kernel/lock_level: To check the lock dependency hierarchy (if enabled).
  • ps and top: To identify processes stuck in the D state.

Using lockdep and dmesg

Run cat /sys/kernel/debug/lockdep to inspect the lock dependency graph. For example:

Lock dependency chain:
(0) sysfs_mutex [0xffff880000001234]
|
| (1) another_mutex [0xffff88000000abcd]

This confirms circular dependencies. Additionally, dmesg output may show BUG: scheduling while atomic or Deadlock detected messages.

Step-by-Step Solution

To resolve this deadlock, follow these steps to ensure proper synchronization during sysfs operations and module unloading:

Implementing Proper Locking

Modify the module to use atomic reference counting for sysfs objects. For example:

// In module initialization:
my_kobj = kobject_create_and_add("my_device", NULL);
if (!my_kobj) {
return -ENOMEM;
}
sysfs_attr_init(&my_attr.attr);
sysfs_create_file(my_kobj, &my_attr.attr);

In the module’s exit() function, ensure the kobject is properly released:

sysfs_remove_file(my_kobj, &my_attr.attr);
kobject_put(my_kobj);

This prevents sysfs_remove_file() from blocking if references are still active.

Enforcing Serialization with mutexes

Use a dedicated mutex within the module to serialize access to sysfs resources:

DEFINE_MUTEX(my_mutex);

Wrap sysfs operations with the mutex:

mutex_lock(&my_mutex);
sysfs_create_file(my_kobj, &my_attr.attr);
mutex_unlock(&my_mutex);

This ensures atomicity during cleanup and prevents race conditions.

Testing and Validation

After updating the module, recompile it and test the scenario:

sudo insmod my_module.ko
sudo rmmod my_module

Monitor dmesg and lockdep output to confirm the deadlock is resolved. Additionally, verify that /sys/class/my_device is removed cleanly without hangs.

Kernel Patching (if necessary)

If the issue occurs in a stock kernel, apply patches to improve locking logic in the sysfs subsystem. For example, ensure that sysfs_remove_file() checks for active references before proceeding. This may involve modifying the kernel source file kernel/sysfs/file.c to add reference counting or timeouts.

Scroll to Top