September 23, 2016 - Mauro Carvalho Chehab
The Problem with Linux Kernel Documentation, and How We’re Fixing it
This article is the first in a series on improvements to Linux Kernel documentation.
The Linux Kernel has one of the biggest communities in the open source world; the numbers are impressive: over 4,000 contributors per year, resulting in about 8 changes per hour. That results in 4,600 lines of code added every day and a major release every 9-10 weeks. With these impressive numbers, it’s impossible for a traditional printed book to follow the changes because by the time the book is finally written, reviewed and published, a lot of changes have already merged upstream. So, the best way to maintain updated documentation is to keep it close to the source code. This way, when some changes happen, the developer that wrote such changes can also update the corresponding documents. That works great in theory, but it is not as effective as one might think.
The Old Methods of Kernel Documentation
For a long time, the Linux Kernel was using two different methods of documentation: plain text files and DocBook. The DocBook documentation has been used to document function calls and is produced from the source code via special markup tags that are parsed via a script (kernel-doc). The Linux Kernel Media subsystem userspace API was also documented using the DocBook format. Yet, most of the Kernel documentation is done via more than 4,000 pure text files, belonging to different categories and subsystems without an index.
The model used for documentation has several issues:
- There are no global indexes for the text files. So, a developer interested in digging into the documentation would need a certain amount of time and patience to use tools like
grepin order to navigate into those files.
- The text files can’t be enriched with images, diagrams (except for ASCII art), accurately-represented math expressions, notes, or text highlighting.
- Writing a document in DocBook is tricky; most developers don’t like doing it since it’s a new language to learn.
- Reviewing patches for DocBook files is not trivial because of the XML markups inside the text. This makes it harder to see what changed on multi-paragraph changes.
- Due to the nature of XML, breaks in documentation are common. A <foo> XML tag should be closed with a </foo> tag. If a close tag is forgotten or misplaced, the XML parser will reject the document.
- It’s not a trivial task to check if a new or modified API had the documentation updated accordingly, especially in the case of the text files.
With all these problems, it should be easy to see how the Kernel documentation has huge gaps in some areas in practice.
Kernel Documentation that Makes Sense
For quite some time, there were discussions about using a text markup language. The advantage of a markup language is that the text can be easily read as-is by a reviewer, while it can be parsed by some toolchain, in order to produce an enriched documentation.
The Kernel documentation maintainer (Jonathan Corbet) wrote an article, back in January, 2016 describing the issues and mentioned a few alternatives for a markup language:
Markdown is one of the most widely accepted markup languages. The problem with pure Markdown is that its spec misses several needed features, like table support. AsciiDoc is a better alternative; it was written as a replacement for DocBook, so it has all necessary features. Yet, the original toolchain to work with AsciiDoc is not maintained anymore. There is a new tool meant to replace it (Asciidoctor), but it’s a new toolchain written in Ruby; Kernel developers were concerned that it wouldn’t be stable enough for our needs. There were also some concerns about maintaining a toolchain in Ruby, which is not one of the traditional languages in the Kernel community domains.
So, after a lot of discussion, we decided to use ReStructuredText because it is more stable, fits the need, and it is easy to extend. Since we also needed to select a toolchain, we decided to use Sphinx. Jani Nikula wrote an article series describing the problem and how the new solution works.
ReStructuredText Markup Language
The ReStructuredText markup (a.k.a ReST) is a full-featured markup language, where most of the markups look like text. So, for example, a title paragraph would look like:
Emphasis use a single asterisk for italic, and a double asterisk for bold:
*this is italic*
**this is bold**
For more details about the basics, please consult the quick reference guide.
There are, however, some limits on its markup. For example, representing big, complex tables with cell spans is not trivial. This is a limitation for converting the existing documentation, specifically the Linux Media documentation, so a Sphinx extension called flat-table was developed, in order to allow the Kernel to represent complex tables. With this extension, we were able to begin the migration of the Linux Media documentation to the new format.
Now that I’ve explained the problem with Linux Kernel documentation and the chosen path forward, in the next article of this series I’ll describe the efforts made to convert the old Linux Media Infrastructure API DocBook document to ReST. Stay tuned!
About Mauro Carvalho Chehab
Mauro is the upstream maintainer of the Linux kernel media and EDAC subsystems, and also a major contributor for the Reliability Availability and Serviceability (RAS) subsystems. Mauro also maintains Tizen on Yocto packages upstream. He works for the Samsung Open Source Group since 2013. Has worked during 5 years at the Red Hat RHEL Kernel team, having broad experience in telecommunications due to his work at some of the Brazilian largest wired and wireless carriers.