Day 2 - PDF-parser

Writer: Vesa Vertainen, Project Engineer, JAMK University of Applied Sciences

In the CinCan project, we have dockerized many analysis tools from several authors. One of them is Didier Stevens, who has made quite a lot of handy forensics software. One of his many PDF tools is called PDF-parser, which is a pretty simple to use, yet versatile tool, used to identify a document’s fundamental elements. Let’s take a look at a few examples, how to run PDF-parser using the cincan/pdf-parser Docker container.

$ docker run --rm -v $(pwd):/data cincan/pdf-parser /data/testfile.pdf -a

The option -a, shows statistics about the objects in the PDF document. It can be used to classify documents and to identify objects. To inspect object 17, which has an OpenAction function, you can set the option -o 17, or you can use the search function as well:

$ docker run --rm -v $(pwd):/data cincan/pdf-parser /data/testfile.pdf -s OpenAction

The tools dockerized within the CinCan project, can also be run with the cincan command line tool, which is actually quite a lot simpler. Here is an example to view a certain element like the “trailer”, with option -e t, using the cincan tool:

$ cincan run cincan/pdf-parser /data/testfile.pdf -e t

One very interesting option is, that you can even have PDF-parser to generate a Python code, that creates a PDF similar to the analyzed one. There are also many other possibilities you can do with PDF-parser. Please, visit blog.didierstevens.com/programs/pdf-tools/ to learn more and to download software.

This, and many other tools are downloadable at the CinCan’s Gitlab repository, and the Docker hub.