April 29, 2015 - Shuah Khan
What Is IOMMU Event Tracing?
The IOMMU event tracing feature enables reporting IOMMU events in the Linux Kernel as they happen during boot-time and run-time. IOMMU event tracing provides insight into IOMMU device topology in the Linux Kernel. This information helps understand which IOMMU group a device belongs to, as well as run-time device assignment changes as devices are moved from hosts to guests and back by the Kernel. The Linux Kernel moves devices from host to guest when users requests such a change.
In addition, IOMMU event tracing helps debug BIOS and firmware problems related to IOMMU hardware and firmware implementation, IOMMU drivers, and device assignment. For example, tracing occurs when a device is detached from the host and assigned to a virtual machine, or the device gets moved from the host domain to the VM domain and allows debugging to occur for each of these processes. The primary purpose of IOMMU event tracing is to help detect and solve performance issues.
Enabling IOMMU event tracing will provide useful information about devices that are using IOMMU as well as as changes that occur in device assignments. In this article, I’ll discuss the IOMMU event tracing feature and the various classes of IOMMU events. In part two of this series, I’ll discuss how to enable and use it to trace events during boot-time and run-time, and how to use the IOMMU tracing feature to get insight into what’s happening in virtualized environments as devices get assigned from hosts to virtual machines and vice versa. This feature helps debug IOMMU problems during development, maintenance, and support.
What is an IOMMU?
IOMMU is short for I/O Memory Management Unit. IOMMUs are hardware that translate device (I/O) addresses to the physical (machine) address space. IOMMU can be viewed as an MMU for devices. MMU maps virtual addresses into physical addresses. Similarly, IOMMU maps device addresses into physical addresses. The following picture shows a comparative depiction of IOMMU vs. MMU.
In addition to basic mapping, the IOMMU provides device isolation via access permissions. Mapping requests are allowed or disallowed based on whether or not the device has proper permissions to access a certain memory region. Another key feature IOOMU brings to the table is I/O Virtualization which provides DMA remapping hardware that adds support for the isolation of device accesses to memory, as well as translation functionality. In other words, devices present I/O addresses to the IOMMU which translates them into machine addresses, thereby bridging the gap between device addressing capability and the system memory range.
What Does an IOMMU Do?
IOMMU hardware provides several key features that enhance I/O performance on a system.
- On systems that support IOMMU, one single contiguous virtual memory region can be mapped to multiple non-contiguous physical memory regions. IOMMU can make a non-contiguous memory region appear contiguous to a device
- Scatter/gather optimizes streaming DMA performance for the I/O device.
- Memory isolation and protection allows device access to memory regions that are mapped for it. As a result, faulty and/or malicious devices can’t corrupt system memory.
- Memory isolation allows safe device assignment to virtual machines without compromising host and other guest operating systems. Similar to the faulty and/or malicious device case, devices are given access to memory regions which are mapped specifically for them. As a result, devices assigned to virtual machines will not have access to the host or another virtual machine’s memory regions.
- IOMMU helps address discrepancies between I/O device and system memory addressing capabilities. For example, IOMMU enables 32-bit DMA capable non-DAC devices access to memory regions above 4GB.
- IOMMU supports hardware interrupt remapping. This feature expands limited hardware interrupts to extendable software interrupts, thereby increasing the number of interrupts that can be supported. Primary uses of interrupt remapping are interrupt isolation, and the ability to translate between interrupt domains. e.g: ioapic vs. x2apic on x86.
As we all know, there is no free lunch! IOMMU introduces latency due to translation overhead in the dynamic DMA mapping path. However, most servers support I/O Translation Table (IOTLB) hardware to reduce the translation overhead.
IOMMU Groups and Device Isolation
Devices are isolated in IOMMU groups. Each group contains a single device or a group of devices, but single device isolation is not always possible for a variety of reasons. Devices behind a bridge can communicate without reaching IOMMU via peer-to-peer communication channels. Unless the I/O hardware/firmware provides a way to disable peer-to-peer communication, IOMMU can’t ensure single device isolation and will be forced to place all the devices behind a bridge in a single IOMMU group for isolation.
Multi-function cards don’t always support the PCI access control services required to describe isolation between functions. In such cases, all functions on a multi-function card are placed in a single IOMMU group. Device(s) in a group can’t be separated for assignment and all devices in that group must be assigned together, even when a virtual machine only needs one of them. For example, IOMMU might be forced to group all 4-ports on a multi-port card because device isolation at port granularity isn’t possible on all hardware.
IOMMU Domains and Protection
IOMMU domains are intended to provide protection against a virtual machine corrupting another virtual machine’s memory. Devices get moved from one domain to another as they get moved between VM’s or from a host to a VM. Any device in a domain is given access to the memory regions mapped for the specific domain it belongs to. When a device is assigned to a VM, it is first detached from the host and removed from the host domain, moved to VM domain, and attached to the VM as shown below:
A Brief Overview of IOMMU Boot and Run-Time Operations
The IOMMU driver creates IOMMU groups and domains during initialization. Devices are placed in IOMMU groups based on their device isolation capabilities. iommu_group_add_device() is called when device is added to a group and iommu_group_remove_device() is called when a device is from a group.
All devices are attached to the host and when a user requests a device to be assigned to a VM, the device gets detached from the host and then attached to the VM. iommu_attach_device() is called to attach a device and iommu_detach_device() is called to detach it. The iommu_map() and iommu_unmap() interfaces are for creating and deleting mappings for the device address space and system address space.
A series of device additions occur during boot. During run-time, after a device is attached, a series of device maps, and unmaps occur until the device is detached.
The ability to have visibility into device additions, deletions, attaches, detaches, maps, and unmaps is valuable in debugging IOMMU problems. As you can see below, this is exactly what IOMMU events are designed to do. In fact, the idea for this tracing work was a result of debugging several IOMMU problems without having a good insight into what’s happening. Let’s take a look at the trace events.
IOMMU Trace Event Classes
IOMMU events are classified into group, device, map and unmap, and error classes to trace activity in each of these areas. Group class events are generated whenever a device gets added and removed from a group. Device class events are intended for tracing device attach and detach activity. Map and unmap events trace map/unmap activity. Finally, In addition to these normal path events, error class events are for tracing autonomous IOMMU faults that might occur during boot-time and/or run-time.
IOMMU Group Class Events
IOMMU group class events are triggered during boot. Traces are generated when devices get added to or removed from an IOMMU group. These traces provide insight into IOMMU device topology and how the devices are grouped for isolation.
- Add device to a group – Ttriggered when IOMMU driver adds a device to a group. Format: IOMMU: groupID=%d device=%s
- Remove device from a group – Triggered when IOMMU driver adds a device to a group. Format: IOMMU: groupID=%d device=%s
IOMMU Device Class Events
Events in this group are triggered during run-time, whenever devices are attached to and detached from domains. For example, when a device is detached from host and attached to a guest. This information provides insight into device assignment changes during run-time.
- Attach (add) device to a domain – Triggered when a device gets attached (added) to a domain. Format: IOMMU: device=%s
- Detach (remove) device from a domain – Triggered when a device gets detached (removed) from a domain. Format: IOMMU: device=%s
IOMMU Map and Unmap Events
Events in this group are triggered during run-time whenever device drivers make IOMMU map and unmap requests. This information provides insight into map and unmap requests and helps debug performance and other problems.
- IOMMU map event – Triggered when IOMMU driver services a map request. Format: IOMMU: iova=0x%016llx paddr=0x%016llx size=%zu
- IOMMU unmap event – Triggered when IOMMU driver services an unmap request. Format: IOMMU: iova=0x%016llx size=%zu unmapped_size=%zu
IOMMU Error Class Events
Events in this group are triggered during run-time when an IOMMU fault occurs. This information provides insight into IOMMU faults and useful in logging the fault and take measures to restart the faulting device. The information in the flags field is especially useful in debugging BIOS and firmware problems related to IOMMU hardware and firmware implementation, as well as, problems resulting from incompatibilities between the OS, BIOS, and firmware in spec compliance.
- IO Page Fault (AMD-Vi): Triggered when an IO Page fault occurs on a AMD-Vi system. Format: IOMMU:%s %s iova=0x%016llx flags=0x%04x
Error class events are implemented in common IOMMU driver code, Intel, and ARM.
How Can IOMMU Event Tracing Help You?
This article is part one of a two part series on IOMMU event tracing. This introduction will help set the knowledge foundation for the second article which will cover how to use this feature to benefit you the most. Stay tuned to this blog to learn more about IOMMU event tracing!
- Utilizing IOMMUs for Virtualization in Linux and Xen, Multiple Authors
- VFIO PCI Device assignment breaks free of KVM, Alex Williamson, RedHat
About Shuah Khan
Shuah Khan is a Senior Linux Kernel Developer at Samsung's Open Source Group. She is a Linux Kernel Maintainer and Contributor who focuses on Linux Media Core and Power Management. She maintains Kernel Selftest framework. She has contributed to IOMMU, and DMA areas. In addition, she is helping with stable release kernel testing. She authored Linux Kernel Testing and Debugging paper published on the Linux Journal and writes Linux Journal kernel news articles. She has presented at several Linux conferences and Linux Kernel Developer Keynote Panels. She served on the Linux Foundation Technical Advisory Board. Prior to joining Samsung, she worked as a kernel and software developer at HP and Lucent.
Image Credits: DTR