March 11, 2016 - Shuah Khan
The Linux Kernel Has Bugs! Really!
A Guide to Finding and Fixing Linux Kernel Bugs
“Our new Constitution is now established, and has an appearance that promises permanency; but in this world nothing can be said to be certain, except death and taxes.” said Benjamin Franklin, in a letter to Jean-Baptiste Leroy, 1789. If he were to be around today, he might have added software bugs to the list of unavoidable things.
Software and bugs go together, and the Linux Kernel is no exception. Some bugs are easy to find, but some are harder to reproduce and could require several attempts to piece together the right set of conditions to trigger them. While bugs that result in a system crash or hang are easier spot, it is often more challenging to gather the information necessary to debug and fix them. In many cases, Kernel logs can provide insight into these bugs, but when a system crashes or freezes Kernel logs might not get the chance to be written out to disk.
Race and timing bugs are elusive and can also be hard to debug because traditional methods like debug logging can change the timing just enough to avoid triggering the bug.
Incorrect or unexpected feature behavior problems are often easier to debug and trace back to the offending module or sub-system. There are some pesky bugs like the 4.4-rc1 VPN bugs, for which there are too many suspects, making it hard to isolate the offending module. In this bug, a VPN will connect successfully, but subsequent web/ssh accesses fail on that connection. I spent several hours chasing the obvious suspects including routers, switches, and network connections, before I remembered that the new element in my environment was the bleeding edge Kernel. The Kernel dmesg gave me a clue to search and find the patch that fixed the problem.
This article is for anyone who is interested in modifying the Linux Kernel, and it will cover a handful of strategies for addressing each of these types of bugs. Let’s get started!
Debugging Resources – What are they?
There is a lot of information that is available on a system during run-time that can aid in debugging.
Kernel Logging – dmesg, kern.log, and syslog provide useful information and aid in debugging. dmesg log levels can be customized to enable additional debug messages.
Check System State – These are useful methods to determine the current system state and the state of various devices (lspci, lsusb etc.). For example, “cat /proc/meminfo” gives information memory usage and how much RAM is in use on the system.
The following picture shows a handy list of Linux performance tools that can be used to observe the system at run-time and diagnose performance and other problems.
Kernel Errors for Everyone!
Oops Messages and System Hangs
Oops messages are printed to the console during system crashes. The Kernel might not get a chance to save these to the disk depending on how serious the problem is. In these cases it is helpful to redirect console messages to the serial port on another system. This method is useful for debugging early boot problems and panics.
System hangs are hard to debug because the system becomes unresponsive, resulting in the inability to run any tools that provide an insight into the system state. There is a Magic SysRq key or a key sequence to which the Kernel will respond to in any state, unless it is completely locked up. On my laptop, SysRq key and Print Screen are shared. Please refer to sysrq.txt under the Documentation directory in the Linux Kernel source code for more information on how to use this feature and to learn which Kernel config options need to be enabled to turn this on. Please note that it is a good idea for Kernel developers to enable the SysRq config options in the Kernel just in case the system runs into a hang.
There are a few debug tools that can aid in debugging problems with system hangs and oops messages, including GDB, which is useful in investigating Kernel addresses in Oops messages and figuring out which Kernel module and line of code is the culprit.
The tools and method we discussed so far are passive and non-intrusive. In some cases, it is necessary to use intrusive and pervasive approaches to gather diagnostic information to help debug problems that are hard to re-create. In such cases, KDB and Dynamic Probes can be used. Please refer to the presentation on Dynamic Event Tracing in Linux Kernel by Masami Hiramatsu for information on the Dynamic Probes feature.
kmemcheck and kmemleak are tools that can be used to detect potential memory-related bugs at runtime. These tools are in the Kernel source tree.
- Detects and warns about uninitialized memory
- CONFIG_KMEMCHECK should be enabled
- Can be used to debug features and detect possible Kernel memory leaks in a similar way to a tracing garbage collector
- CONFIG_DEBUG_KMEMLEAK should be enabled
Kernel Address Sanitizer (KASan) is another tool that can be use to find invalid memory accesses by the Kernel. This tool uses two things: a GCC feature to instrument the Kernel memory accesses, and a shadow memory to determine when the Kernel performs an invalid access to a memory address. Please refer to kasan.txt under the Documentation directory in the Linux Kernel source code for more information on how to enable and use KASan.
The Linux Kernel also supports several debug interfaces including debug Kernel configuration options, debug APIs, dynamic debug, and tracepoints to name a few.
There are two different ways Kernel Debug Interfaces can be triggered:
- Per-callsite: a specific pr_debug() can be enabled using the line number
in the source file. e.g: enabling pr_debug() in kernel/power/suspend.c at
- $ echo ‘file suspend.c line 340 +p’ >
- $ echo ‘file suspend.c line 340 +p’ >
- Per-module: passing in dyndbg=”plmft” modprobe or changing or
creating modname.conf file in /etc/modprobe.d/ – The later persists
across reboots. However for drivers that get loaded from initramfs,
change grub to pass in module.dyndbg=”+plmft”
Several Kernel modules have debug configuration options that can be enabled at compile time with no dynamic control to enable/disable. Debug messages go to dmesg, and these include lock debugging (spinlocks, mutexes, etc…), debug lockups and hangs, read-copy update debugging, and memory debugging.
For example, the DMA-Debug API is designed for debugging driver DMA API usage errors. When CONFIG_HAVE_DMA_API_DEBUG and CONFIG_DMA_API_DEBUG options are enabled, DMA APIs are instrumented to call into a special DMA debug interface. This debug interface runs checks to diagnose errors in DMA API usage; it does this by keeping track of per-device DMA mapping information. This is used to detect unmap attempts on addresses that aren’t mapped in addition to missing mapping error checks in driver code after a DMA map attempt.
Kernel debug options require recompiling the Kernel and add overhead that could change timing. As a result, these aren’t a good choice for debugging problems that result from race conditions and timing-related problems.
Per-Callsite Dynamic Debugging
Dynamic debug on the other hand allows dynamic enabling/disabling of pr_debug(), dev_dbg(), print_hex_dump_debug(), print_hex_dump_bytes() per call site. CONFIG_DYNAMIC_DEBUG controls feature enable/disable. Dynamic debug options can be specified using /sys/kernel/debug/dynamic_debug/control (virtual file). Choosing dynamic_debug.verbose=1 kernel boot option will increase the
verbosity. Please refer to Documentation/dynamic-debug-howto.txt for more information.
Even though dynamic debug allows selectively enabling messages, it creates extra overhead for each message, even when nothing is printed. In many cases, additional checks are run to determine if any messages need printing.
The final feature I’ll cover here is tracepoints, which can be used for debugging, event reporting, and performance accounting. They can be enabled to trigger at run-time, however they differ from dynamic debugging in that they are inactive unless they are specifically enabled. When enabled, code is modified to execute the tracepoint code, but it doesn’t add any overhead, unlike dynamic. The tracepoints use jump labels which are a code modification of a branch. When the tracepoint is disabled, it simply returns, but when it is enabled, jmp label takes the next instruction execution to the tracepoint code.
In addition, there are several debug modules like the test_firmware, and test_bpf to test specific functionality. The Kernel includes several developer regression tests in its Kselftest sub-system. These tests can be run to find any regressions in a newly released Kernel.
Make Your Debugging Count
As a closing thought, I would caution you to watch out for a few things when using debug options:
- They may add performance cost
- They might use non-stack allocations which result in extra overhead for allocating memory, increasing the memory footprint
- They might execute non-optimal code paths.
- They aren’t good for debugging races/timing bugs and could generate too many debug messages.
This guide should be helpful for anyone looking to debug issues with the Linux Kernel. If you have any questions, feel free to post them in the comments section. Happy Debugging!
About Shuah Khan
Shuah Khan is a Senior Linux Kernel Developer at Samsung's Open Source Group. She is a Linux Kernel Maintainer and Contributor who focuses on Linux Media Core and Power Management. She maintains Kernel Selftest framework. She has contributed to IOMMU, and DMA areas. In addition, she is helping with stable release kernel testing. She authored Linux Kernel Testing and Debugging paper published on the Linux Journal and writes Linux Journal kernel news articles. She has presented at several Linux conferences and Linux Kernel Developer Keynote Panels. She served on the Linux Foundation Technical Advisory Board. Prior to joining Samsung, she worked as a kernel and software developer at HP and Lucent.
Image Credits: Open Source Way