Debugging a Linux Kernel Disk I/O Deadlock: Symptoms, Diagnosis, and Resolution

Symptoms of a Disk I/O Deadlock

A disk I/O deadlock manifests as an unresponsive system where processes block indefinitely while waiting for disk operations to complete. Common indicators include:

Processes entering the “D” state (uninterruptible sleep) in ps -el
High I/O wait times observed in top or vmstat
Kernel logs (dmesg) showing “I/O deadlock” or “blocked on”
Zero progress in disk activity, even though iostat reports high %util

This issue often occurs in environments with heavy concurrent I/O workloads or custom kernel modules interacting with block devices.

Root Cause Analysis

The deadlock stems from improper synchronization in the kernel’s block layer, particularly in the submit_bio() or request queue management. A race condition arises when a process acquires a lock required for I/O submission, while another process holds it and waits for the same I/O to complete. For example:

A custom driver may fail to release a lock during bio_endio()
Improper use of spin_lock_irqsave() or mutex_lock() in request handling
Deadlock between the blkdev_issue_flush() and bio->bi_end_io() callbacks
Kernel versions with regressions in the I/O scheduler (e.g., deadline or noop)

Diagnosis Tools and Techniques

System administrators should use the following tools to identify and analyze the deadlock:

dmesg: Look for deadlock-related messages like “I/O deadlock” or stack traces with blkdev_issue_flush
iostat -x 1: Monitor disk activity for persistent high utilization without throughput
pidstat -d 1: Identify processes with excessive I/O wait times
ftrace: Trace I/O submission and lock acquisition patterns. Example:
echo 1 > /sys/kernel/debug/tracing/tracing_on
echo bio -> submit_bio /sys/kernel/debug/tracing/set_event
perf: Profile lock contention with perf lock or perf stat
gdb
gdb -ex "set pagination 0" -ex "thread apply all bt" --batch --cd /usr/lib/debug

Step-by-Step Solution

Identify Locked Processes: Run ps aux | grep D to find processes in the “D” state. Note their PIDs.
Inspect Kernel Logs: Use dmesg | grep -i deadlock to find traces. Example output:
[12345.67890] INFO: task kworker/u32:1:12345 blocked for more than 120 seconds.
Analyze with ftrace: Enable I/O tracing and capture stack traces. Check for repeated cycles of lock acquisition.
echo 1 > /sys/kernel/debug/tracing/tracing_on echo bio -> submit_bio > /sys/kernel/debug/tracing/set_event sleep 10 cat /sys/kernel/debug/tracing/trace | grep -i lock
Update Kernel and Drivers: Apply patches from https://git.kernel.org or update storage drivers (e.g., scsi_mod or dm-multipath).
Adjust I/O Scheduler: Temporarily switch to a different scheduler to test:
echo deadline > /sys/block/sda/queue/scheduler
Debug with GDB: Analyze the /var/crash/ vmcore file to inspect the call stack of blocked processes. Look for blk_start_request() or __make_request() in the trace.
Review Custom Modules: If using third-party drivers, unload them and test. Use modinfo to check for known issues.
Reproduce and Test: Use fio to simulate I/O workloads and verify if the deadlock reoccurs after fixes. Example:
fio --name=test --ioengine=libaio --direct=1 --size=1G --rw=randrw

Example Code: Kernel Module Locking Bug

The following code snippet demonstrates a hypothetical improper lock usage in a custom driver:
spinlock_t my_lock; void my_bio_end_io(struct bio *bio) { spin_lock(&my_lock); // Critical section without proper unlocking spin_unlock(&my_lock); }
This could lead to a deadlock if another thread attempts to acquire the same lock while waiting for I/O completion.

Conclusion

Resolving a disk I/O deadlock requires meticulous analysis of kernel logs, tracing tools, and synchronization mechanisms. By isolating the deadlock source, updating drivers, and verifying changes with workload testing, administrators can restore system stability. Always monitor kernel changes and apply upstream patches for long-term resolution.