Using Nix to Fuzz Test a PDF Parser (Part One)

October 23, 2024 14-minute read

Fuzz testing is a technique for automatically uncovering bugs in software. The problem is that it’s a pain to set up. Read any fuzz testing tutorial, and the first task is an hour of building tools from source and chasing down dependencies upon dependencies.

I recently found that Nix eliminates a lot of the gruntwork from fuzz testing. I created a Nix configuration that kicks off a fuzz testing workflow with a single command. The only dependencies are Nix and git.

I used my Nix workflow to find an unpatched bug in a PDF renderer, even though I’m a beginner at both Nix and fuzz testing.

A preview of the solution 🔗︎

Here’s a preview of my final result: you can start fuzz testing an open-source PDF reader with a single command:

nix run gitlab:mtlynch/fuzz-xpdf

The command should work on any Linux system with Nix installed, and maybe MacOS, too. After a few minutes of building, you should see a terminal UI that looks like this:

Screenshot of honggfuzz's terminal UI, showing progress fuzz testing pdftotext — Nix allows me to install all dependencies and begin fuzz testing in a single command.

Here’s everything that happens when you run the command above:

Nix downloads all tools and dependencies for the PDF reader and the testing toolchain.
Nix compiles the PDF reader from source with proper instrumentation for fuzz testing.
Nix downloads a set of edge-case PDFs for generating test inputs.
Nix automatically generates new PDFs, feeds them to the PDF reader, and reports which inputs caused the PDF reader to crash.

If you want to change the fuzzing options or test a different version of the PDF reader, it’s as simple as editing a single file.

I’m going to share how I created the fuzz testing workflow step by step. You can use the same methodology to find bugs in other projects.

If you’re impatient, you can skip to the end to see my final result.

What’s fuzz testing? 🔗︎

Fuzz testing or “fuzzing” is a way of finding bugs in software by randomly generating input data and checking to see if the input causes the target application to crash.

For example, to test a program that resized JPEG images, the workflow would look like this:

Take a set of valid and/or malformed JPEG files.
Randomly select one of the input files and randomly mutate it (flip some bits, add some data, delete some data).
Feed the mutated input file to the image resizing program.
If the mutated input caused the program to crash or hang, save the input for later analysis.
Go back to step (2)

What’s Nix? 🔗︎

Nix is a complex tool that does a lot of different things, many of which I don’t even understand.

For the purposes of this article, it’s sufficient to understand two things about Nix:

Nix is a package manager, similar to apt or yum. Nix has 100k+ packages available to run within the Nix environment.
Nix is a build tool, similar to make or Docker. Nix allows you to define a set of build steps and the dependencies between them. When you request a build from Nix, it performs all the required steps to create the result you requested.

Requirements 🔗︎

To follow along, you’ll only need two things:

Nix (with the flakes feature enabled)
- I recommend the Determinate Systems installer, which enables flakes by default.
git

Selecting a fuzzing target 🔗︎

The PDF reader I’m fuzz testing is called xpdf. It’s a PDF viewer, but it ships with a suite of PDF utilities. One of the utilities, pdftotext is an attractive fuzzing target because it’s so simple. It has no GUI; it just accepts a PDF as input and produces plaintext as output. It still exercises xpdf’s complex PDF parsing code, so if I find a bug in pdftotext, it means I’ve probably found a bug in the whole xpdf suite.

Putting the Nix boilerplate in place 🔗︎

To start the project, I create a new folder and git repository.

mkdir fuzz-xpdf \
  && cd fuzz-xpdf \
  && git init

Next, I create a file called flake.nix:

{
  description = "compile xpdf from source for fuzzing";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.05";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = nixpkgs.legacyPackages.${system};
      in
      {
        packages = rec {
          default = xpdf;

          xpdf = pkgs.stdenv.mkDerivation rec {
            # TODO: I'll populate this next.
          };
        };
      }
    );
}

This is a Nix “flake,” which defines a set of Nix packages and applications.

So far, this is just a boilerplate skeleton of a Nix flake. Most of it is not worth discussing except this line:

nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.05";

This tells Nix that when I want to pull in packages, I’m pulling them from the May 2024 branch of the package repository, the latest stable branch at the time of this writing.

This file is just a skeleton and won’t successfully build yet. To compile xpdf using Nix, I need to add a few bits.

Specifying a source tarball 🔗︎

To compile xpdf, I need a copy of its source code.

First, I call mkDerivation, which is how Nix defines build components. It requires a package name (pname) and version, so I specify xpdf, the package I want to fuzz, and 4.05, the latest published version of xpdf, as of this writing.

{
  xpdf = pkgs.stdenv.mkDerivation rec {
    pname = "xpdf";
    version = "4.05";
    ...

The other required field in mkDerivation is a src property, which specifies how Nix should retrieve the inputs for the build. In the case of xpdf, the source tarball is located at this URL:

https://dl.xpdfreader.com/xpdf-4.05.tar.gz

I specify xpdf’s tarball URL using the pname and version variables so that when the version number changes in the future, the URL will still work:

{
  xpdf = pkgs.stdenv.mkDerivation rec {
    ...
    src = pkgs.fetchzip {
      url = "https://dl.xpdfreader.com/${pname}-${version}.tar.gz";
      extension = "tar.gz";
    };

The problem is that Nix needs a hash of the tarball to determine whether the local version matches what’s on the server. If I run nix build at this point, Nix complains that the hash is wrong:

warning: found empty hash, assuming 'sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
error: hash mismatch in fixed-output derivation '/nix/store/z3ckfdjqpfd73xkkwsnpg4ijwj60vyz8-source.drv':
         specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
            got:    sha256-LBxKSrXTdoulZDjPiyYMaJr63jFHHI+VCgVJx310i/w=

To fix the hash mismatch, I paste the value from the error message into my flake.nix:

{
  xpdf = pkgs.stdenv.mkDerivation rec {
    ...
    src = pkgs.fetchzip {
      url = "https://dl.xpdfreader.com/${pname}-${version}.tar.gz";
      # Paste the hash that appeared next to "got" in the error message.
      hash = "sha256-LBxKSrXTdoulZDjPiyYMaJr63jFHHI+VCgVJx310i/w=";
      extension = "tar.gz";
    };

Compiling xpdf from source 🔗︎

Now that I’ve shown Nix how to retrieve xpdf’s source code, I have to figure out how to build that code.

The xpdf compile instructions list the following dependencies:

Make sure you have the following installed:
CMake 2.8.8 or newer
FreeType 2.0.5 or newer
Qt 5.x or 6.x (for xpdf only)
libpng (for pdftopng and pdftohtml)
zlib (for pdftopng and pdftohtml)

I only want to run pdftotext, so I only need CMake and FreeType.

Building a complex tool from source is usually a painful process. I want to build tool A, but it depends on library X, so I have to figure out how to install library X. It turns out library X depends on libraries Y and Z, so I have to figure out how to install those, and so on.

Nix radically simplifies the process of building from source in two ways:

Nix has one of the largest package repositories of any package manager, so most packages I need are already available.
Nix packages are not tied to any OS version, so as long as there’s a Nix package for my architecture, I can use it.

Looking at the Nix package repository, I see that packages for CMake and FreeType are indeed already available:

I assume I only need CMake at build time, not at runtime, which means it belongs under nativeBuildInputs. I probably need FreeType at runtime, so I specify it under buildInputs:

{
  xpdf = pkgs.stdenv.mkDerivation rec {
    ...

    # Build dependencies belong here.
    nativeBuildInputs = with pkgs; [
      cmake
    ];

    # Runtime dependencies belong here.
    buildInputs = with pkgs; [
      freetype
    ];

At this point, my flake.nix looks like this:

{
  description = "compile xpdf from source for fuzzing";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.05";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = nixpkgs.legacyPackages.${system};
      in
      {
        packages = rec {
          default = xpdf;

          xpdf = pkgs.stdenv.mkDerivation rec {
            pname = "xpdf";
            version = "4.05";

            src = pkgs.fetchzip {
              url = "https://dl.xpdfreader.com/${pname}-${version}.tar.gz";
              hash = "sha256-LBxKSrXTdoulZDjPiyYMaJr63jFHHI+VCgVJx310i/w=";
              extension = "tar.gz";
            };

            nativeBuildInputs = with pkgs; [
              cmake
            ];

            buildInputs = with pkgs; [
              freetype
            ];
          };
        };
      }
    );
}

When I build with Nix, it generates output under a folder called result, so I create a file called .gitignore that excludes that folder from source control:

echo 'result' > .gitignore

Next, I add everything to my git repository:

git add --all

Note: An annoying gotcha of Nix flakes is that Nix can’t see files unless they’re under git source control. If you get error messages about “file not found,” check that you’ve added the file to git.

Finally, I build the package from source with nix build:

nix build

If everything worked, there should be a set of binaries under ./result/bin that I can run:

$ ls ./result/bin/
pdfdetach  pdffonts  pdfimages  pdfinfo  pdftohtml  pdftopng  pdftoppm  pdftops  pdftotext

Sure enough, pdftotext works correctly:

$ ./result/bin/pdftotext -v
pdftotext version 4.05 [www.xpdfreader.com]
Copyright 1996-2024 Glyph & Cog, LLC

As a test I downloaded the Form W-4 PDF from the IRS website and fed it to pdftotext:

$ ./result/bin/pdftotext fw4.pdf /dev/stdout | head -n 5
Form W-4
Department of the Treasury Internal Revenue Service

Employee's Withholding Certificate
Complete Form W-4 so that your employer can withhold the correct federal income tax from your pay. Give Form W-4 to your employer.

Cool, that looks correct.

The full source at this stage is available on Gitlab.

That was confusingly easy 🔗︎

If you’re confused about how Nix built the xpdf binaries, so was I.

I hadn’t even told Nix what the build process was for xpdf, so how did it know?

It turns out that the Nix mkDerivation function I called assumes a standard make build process:

for Unix packages that use the standard ./configure; make; make install build interface, you don’t need to write a build script at all; the standard environment does everything automatically. If stdenv doesn’t do what you need automatically, you can easily customise or override the various build phases.
“The Standard Environment” from the Nix Manual

Still, it seemed a bit too magical to me.

The xpdf instructions explain how you have to tell the compiler where to find FreeType’s headers and libraries. I never did that, so how was Nix compiling the project anyway?

And make install normally writes to a system-wide directory like /usr/bin, so how did that happen if I never elevated to root privileges with sudo?

I suspected that, in addition to implicitly calling the make build sequence, Nix was quietly controlling the build process through environment variables.

To test my theory, I replaced the default installPhase section of mkDerivation with one that dumped all of the environment variables:

{
  xpdf = pkgs.stdenv.mkDerivation rec {
    ...
    installPhase = ''
      printenv
      make install
    '';

I then re-ran nix build with verbose logging:

nix build -L

Sure enough, I saw that it pointed to the FreeType headers via the CMAKE_INCLUDE_PATH variable:

CMAKE_INCLUDE_PATH=/nix/store/rmqyzrzpz2kzmn8329bc4fjmzvd33ylw-freetype-2.13.2-dev/include:...

And the reason it hadn’t scribbled over my /usr/bin directory was that Nix told CMake to install in a Nix-specific install directory:

cmakeFlags=...-DCMAKE_INSTALL_BINDIR=/nix/store/7w4ql3kdrl3c0knnvx3lxsnrqfzfcy34-xpdf-4.05/bin

This aspect of Nix’s behavior is a double-edged sword. When it works, it feels magical that Nix figured out the build process without me having to hold its hand. But if it hadn’t worked, I’d have to debug the issue through Nix’s opaque abstractions.

Compiling xpdf with honggfuzz 🔗︎

Now that I can compile xpdf successfully, it’s time to introduce the fuzz testing part of the workflow.

honggfuzz is a Google-maintained fuzz testing tool. It’s a coverage-guided fuzzer, which means that it traces which parts of the target binary execute for a particular test input. When it discovers an input that causes the binary to execute a new code path, it generates more inputs similar to the one that opened a new code path, as it means a greater chance of hitting untested behavior.

honggfuzz ships with C and C++ compilers, so compiling xpdf with honggfuzz should be as simple as pointing Nix at honggfuzz’s compilers instead of Nix’s default compilers. To do this, I first modify nativeBuildInputs to include the honggfuzz package, so that it’s available during compilation:

{
  xpdf = pkgs.stdenv.mkDerivation rec {
    ...
    nativeBuildInputs = with pkgs; [
      cmake
      honggfuzz
    ];
}

Okay, now honggfuzz will be available in my build environment, but how do I tell CMake to use the honggfuzz compiler instead of whatever it was using before?

Make and CMake obey the CC and CXX environment variables, which specify, respectively, which C and C++ compilers to use.

I can see that honggfuzz ships with compilers called hfuzz-clang and hfuzz-clang++. That sounds promising, but I don’t know where to find those binaries in honggfuzz’s Nix package. I search the package like this:

$ nix build nixpkgs#honggfuzz

$ find -L result -type f -name hfuzz-clang
result/bin/hfuzz-clang

Okay, that tells me that the compilers in honggfuzz’s Nix package are in the bin/ subdirectory.

To tell Nix to build xpdf using the honggfuzz compilers, I point the CC and CXX variables to the right compiler paths:

{
    xpdf = pkgs.stdenv.mkDerivation rec {
      ...

      preConfigure = ''
        export CC=${pkgs.honggfuzz}/bin/hfuzz-clang
        export CXX=${pkgs.honggfuzz}/bin/hfuzz-clang++
      '';
}

If I build with verbose output, I see that Nix is indeed using the hongfuzz compilers:

$ nix build -L
...
xpdf> -- The C compiler identification is Clang 16.0.6
xpdf> -- The CXX compiler identification is Clang 16.0.6
...
xpdf> -- Check for working C compiler: /nix/store/kb9vkjv4admbdixrjyanfb1i9dd3cbmm-honggfuzz-2.6/bin/hfuzz-clang - skipped
...
xpdf> -- Check for working CXX compiler: /nix/store/kb9vkjv4admbdixrjyanfb1i9dd3cbmm-honggfuzz-2.6/bin/hfuzz-clang++ - skipped

At this point, flake.nix should look like this.

Ad-hoc fuzzing in a dev shell 🔗︎

I’ve compiled xpdf using honggfuzz’s compiler, but now I want to get to the fun stuff.

I could set up an elegant command for kicking off fuzzing within my Nix flake, but at this point, I just want to get my hands dirty and start messing around as quickly as possible. To do that, I create a Nix dev shell with all of my tools available.

To create a Nix dev shell, I add the following to my Nix flake:

{
    packages = rec {
        ...
    };

    devShells.default = pkgs.mkShell {
      buildInputs = self.packages.${system}.xpdf.nativeBuildInputs ++ (with pkgs; [
        wget
      ]);

      shellHook = ''
        wget --version | head -n 1
      '';
    };

At this point, flake.nix should look like this.

I enter my Nix dev shell by typing nix develop:

$ nix develop
GNU Wget 1.21.4 built on linux-gnu.

The “GNU Wget” output is from shellHook, which prints the version numbers of the tools available within the shell. The honggfuzz binary is also available:

$ honggfuzz --help 2>&1 | head -n 1
Usage: honggfuzz [options] -- path_to_command [args]

That works because, within mkShell, I specified buildInputs as all the nativeBuildInputs from the xpdf package (cmake and honggfuzz) plus wget, which I want only in my dev shell for downloading PDFs.

Next, I create a directory to store the fuzz results. Since this is just experimental, I’m using a temporary directory:

PDF_DIR="$(mktemp --directory)"

Then, I grab a PDF to use as my sample input.

$ PDF_URL='https://www.irs.gov/pub/irs-pdf/fw4.pdf' && \
  wget --directory-prefix="${PDF_DIR}" "${PDF_URL}"

I do one more nix build to ensure that pdftotext is ready to run under the ./result/bin folder:

$ nix build && ./result/bin/pdftotext -v
pdftotext version 4.05 [www.xpdfreader.com]
Copyright 1996-2024 Glyph & Cog, LLC

Finally, it’s the moment of truth. I kick off honggfuzz’s test runner:

$ honggfuzz \
    --input "${PDF_DIR}" \
    -- ./result/bin/pdftotext ___FILE___

Here’s how it works:

--input "${PDF_DIR}" specifies the directory of input files to mutate.
-- ./result/bin/pdftotext ___FILE___: Specifies the target program to fuzz. ___FILE___ is a placeholder parameter. honggfuzz replaces it with the path to a newly generated file on each execution.

I run the command and am greeted to the honggfuzz fuzzing interface:

honggfuzz shows a terminal UI to display fuzz testing progress

It worked! I could let honggfuzz run for a few days to see if it catches anything, but I want to polish the workflow a bit more to increase the probability of finding bugs.

Next: Using Nix to find an unpatched bug in xpdf 🔗︎

At this point, I’ve shown how to use Nix and honggfuzz to perform basic fuzz testing of the xpdf PDF reader.

In my follow-up post, I’ll show how to:

Automate the complete fuzzing workflow.
Gather tricky PDFs that are more likely to cause crashes.
Find an unpatched bug in the latest version of xpdf.

Read on below:

Using Nix to Fuzz Test a PDF Parser (Part Two)

Thanks to Antonio Morales for creating the Fuzzing101 tutorial series upon which this work is based.

Refactoring English: Effective Writing for Software Developers

I'm writing a book to capture all the techniques for effective writing that I've learned in my fifteen years as a professional software developer and frequent writer.

My book will teach you how to:

Create clear and pleasant software tutorials
Write effective emails
Minimize pain in writing design documents
Overcome writer's block

Read Michael's Book

A preview of the solution 🔗︎

What’s fuzz testing? 🔗︎

What’s Nix? 🔗︎

Requirements 🔗︎

Selecting a fuzzing target 🔗︎

Putting the Nix boilerplate in place 🔗︎

Specifying a source tarball 🔗︎

Compiling xpdf from source 🔗︎

That was confusingly easy 🔗︎

Compiling xpdf with honggfuzz 🔗︎

Ad-hoc fuzzing in a dev shell 🔗︎

Next: Using Nix to find an unpatched bug in xpdf 🔗︎

Refactoring English: Effective Writing for Software Developers

Be the first to know when I post cool stuff

Share on