Debugging a Linux Kernel Disk I/O Deadlock: Symptoms, Diagnosis, and Resolution
Symptoms of a Disk I/O Deadlock
A disk I/O deadlock manifests as an unresponsive system where processes block indefinitely while waiting for disk operations to complete. Common indicators include:
- Processes entering the “D” state (uninterruptible sleep) in
ps -el
- High I/O wait times observed in
top
orvmstat
- Kernel logs (
dmesg
) showing “I/O deadlock” or “blocked on” - Zero progress in disk activity, even though
iostat
reports high %util
This issue often occurs in environments with heavy concurrent I/O workloads or custom kernel modules interacting with block devices.
Root Cause Analysis
The deadlock stems from improper synchronization in the kernel’s block layer, particularly in the submit_bio()
or request queue
management. A race condition arises when a process acquires a lock required for I/O submission, while another process holds it and waits for the same I/O to complete. For example:
- A custom driver may fail to release a lock during
bio_endio()
- Improper use of
spin_lock_irqsave()
ormutex_lock()
in request handling - Deadlock between the
blkdev_issue_flush()
andbio->bi_end_io()
callbacks - Kernel versions with regressions in the I/O scheduler (e.g.,
deadline
ornoop
)
Diagnosis Tools and Techniques
System administrators should use the following tools to identify and analyze the deadlock:
dmesg
: Look for deadlock-related messages like “I/O deadlock” or stack traces withblkdev_issue_flush
iostat -x 1
: Monitor disk activity for persistent high utilization without throughputpidstat -d 1
: Identify processes with excessive I/O wait timesftrace
: Trace I/O submission and lock acquisition patterns. Example:
echo 1 > /sys/kernel/debug/tracing/tracing_on
echo bio -> submit_bio /sys/kernel/debug/tracing/set_event
perf
: Profile lock contention withperf lock
orperf stat
gdb
gdb -ex "set pagination 0" -ex "thread apply all bt" --batch --cd /usr/lib/debug
Step-by-Step Solution
- Identify Locked Processes: Run
ps aux | grep D
to find processes in the “D” state. Note their PIDs. - Inspect Kernel Logs: Use
dmesg | grep -i deadlock
to find traces. Example output:
[12345.67890] INFO: task kworker/u32:1:12345 blocked for more than 120 seconds.
- Analyze with ftrace: Enable I/O tracing and capture stack traces. Check for repeated cycles of lock acquisition.
echo 1 > /sys/kernel/debug/tracing/tracing_on
echo bio -> submit_bio > /sys/kernel/debug/tracing/set_event
sleep 10
cat /sys/kernel/debug/tracing/trace | grep -i lock - Update Kernel and Drivers: Apply patches from
https://git.kernel.org
or update storage drivers (e.g.,scsi_mod
ordm-multipath
). - Adjust I/O Scheduler: Temporarily switch to a different scheduler to test:
echo deadline > /sys/block/sda/queue/scheduler
- Debug with GDB: Analyze the
/var/crash/
vmcore file to inspect the call stack of blocked processes. Look forblk_start_request()
or__make_request()
in the trace. - Review Custom Modules: If using third-party drivers, unload them and test. Use
modinfo
to check for known issues. - Reproduce and Test: Use
fio
to simulate I/O workloads and verify if the deadlock reoccurs after fixes. Example:
fio --name=test --ioengine=libaio --direct=1 --size=1G --rw=randrw
Example Code: Kernel Module Locking Bug
The following code snippet demonstrates a hypothetical improper lock usage in a custom driver:
spinlock_t my_lock;
void my_bio_end_io(struct bio *bio) {
spin_lock(&my_lock);
// Critical section without proper unlocking
spin_unlock(&my_lock);
}
This could lead to a deadlock if another thread attempts to acquire the same lock while waiting for I/O completion.
Conclusion
Resolving a disk I/O deadlock requires meticulous analysis of kernel logs, tracing tools, and synchronization mechanisms. By isolating the deadlock source, updating drivers, and verifying changes with workload testing, administrators can restore system stability. Always monitor kernel changes and apply upstream patches for long-term resolution.