December 1, 2017 - Mauro Carvalho Chehab

Linux Kernel License Practices Revisited with SPDX®

The licensing text in the Linux kernel source files is inconsistent in verbiage and format. Typically, in each of its ~100k files there is a license text that describes which license applies to each specific file. While all licenses are GPLv2 compatible, properly identifying the licenses that are applicable to a specific file is very hard. To address this problem, a group of developers recently embarked on a mission to use SPDX® to research and map these inconsistencies in the licensing text. As a result of this 10 month long effort, the Linux 4.14 release includes changes to make the licensing text consistent across the kernel source files and modules.

Linux Kernel License

As stated on its COPYING file, the Linux kernel’s default license is GPLv2, with an exception that grants additional rights to the kernel users:

The kernel’s COPYING file produces two practical effects:

  1. User-space applications can use non-GPL licenses by using the the above mentioned exception.
  2. It allows using different licenses in the kernel’s source files, when explicitly defined as such.

The Current Kernel License Model

A common practice is to add a comment at the beginning of each file with some sort of text like:

However, Philippe Ombredanne’s research showed that:

  • there are 64,742 distinct license statements
    • … in 114,597 blocks of text
    • … in 42,602 files
  • license statements represent 480,455 lines of text;
  • licenses are worded in 1,015 different ways;
  • there are about 85 distinct licenses, the bulk being the GPL.

Also, before kernel 4.14, there were about 11,000 files without any license at all. However, due to the COPYING file they’re defaulted to be GPLv2. This inconsistency makes it complex to determine which license applies to a particular kernel version.

Software Package Data Exchange® (SPDX®)

The Linux Foundation has sponsored the SPDX® project to solve the license identification challenges in open source software. The goal of the project is to provide license identifiers inside the source code that could be easily parsed by machines and would allow checking for license compliance of an open source project easier.

In practice, supporting SPDX® inside source code is as simple as adding an SPDX® tag (SPDX-License-Identifier) with the license that applies (usually, GPL-2.0). If you’re the copyright holder, you may also consider removing the now redundant licensing text.

An example commits that add SPDX® tags and cleanup redundant license from USB over IP driver:

Depending on the type of the source file, the tag will be either one of the tags below:

SPDX® licence identifier tags at the source code
Type of file SPDX License tag
C source: // SPDX-License-Identifier:
C header: /* SPDX-License-Identifier: */
ASM: /* SPDX-License-Identifier: */
scripts: # SPDX-License-Identifier:
.rst: .. SPDX-License-Identifier:
.dts{i}: // SPDX-License-Identifier:

Replacing the licenses inside each source file by a single SPDX license will likely reduce the kernel source files by ~400k lines, with is a nice cleanup, and solve all those issues with ~64k different license statements.

Future Work

Besides the license on each kernel source file, all Linux modules use MODULE_LICENSE() macro to specify their individual license. Right now, the following values for the macro are valid for a module within the official kernel sources:

Types of licenses for MODULE_LICENSE() macro
Macro argument License
“GPL” GNU Public License v2 or later
“GPL v2” GNU Public License v2 only
“GPL and additional rights” GNU Public License v2 rights and more
“Dual BSD/GPL” GNU Public License v2 or BSD license choice
“Dual MIT/GPL” GNU Public License v2 or MIT license choice
“Dual MPL/GPL” GNU Public License v2 or MPL license choice

Any other value “taints” the kernel when such module is loaded, in order to inform the user that a proprietary module was loaded.

With the addition of SPDX® at the kernel, the next step will be to use SPDX tags for Module License. The current plans seem to be to replace the MODULE_LICENSE() macro with MODULE_LICENSE_SPDX() that would take the same SPDX identifier as used in the source code and convert to the values above to keep backward compatibility. The method to achieve this is still under discussion at the Linux Kernel Mailing List.

Additional References:

LWN.net has an interesting article covering other aspects: SPDX identifiers in the kernel. The Free Software Foundation Europe has a set of best practices for License identification at the source code, called REUSE.

The SPDX Trademark is owned by the Linux Foundation.

About Mauro Carvalho Chehab

Mauro is the upstream maintainer of the Linux kernel media and EDAC subsystems, and also a major contributor for the Reliability Availability and Serviceability (RAS) subsystems. Mauro also maintains Tizen on Yocto packages upstream. He works for the Samsung Open Source Group since 2013. Has worked during 5 years at the Red Hat RHEL Kernel team, having broad experience in telecommunications due to his work at some of the Brazilian largest wired and wireless carriers.

Compliance / Development / Linux / Open Source Infrastructure license / Linux Foundation / Linux Kernel / SPDX /

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments Protected by WP-SpamShield Anti-Spam