Analysing malicious PDF documents using Dockerized tools
Writer: Vesa Vertainen, Project Engineer, JAMK University of Applied Sciences
One of the goals of the CinCan project is to provide tools that automate the repetitive tasks of malware analysis using practices familiar from continuous integration to enable rapid creation, augmentation, correlation and sharing of analysis and threat intelligence. Using Docker containers, we have portable tools, which can be conveniently configured for use in designated toolchains.
In order to test our Dockerized PDF analysis tools, we created a “malicious” PDF document using Metasploit. The document automatically runs an executable when opened. Let’s see what we discovered using these tools.
At first, we used a tool by Didier Stevens, PDFID, with the triage plugin. The plugin scores samples as 1.0 (likely malicious) or 0.00 (clean), and sometimes something in between. The file sample.pdf is located in /samples, so, the Docker command to run is:
$ docker run --rm -v /samples:/samples cincan/pdfid /samples/sample.pdf -p plugin_triage
We also wanted to see what results PDFExaminer gives for our PDF. PDF Examiner is a free online scanner, which also has an API. To send in the sample, we ran the cincan/pdfexaminer image. In addition to this summary of results, PDFExaminer is also able to output several different formats like XML and JSON.
$ docker run --rm -v /samples:/samples cincan/pdfexaminer /samples/sample.pdf summary
$ docker run --rm -v /samples:/samples cincan/jsunpack-n /samples/sample.pdf -d /samples/output
The unescape function translates the unicode to binary. The first shellcode file (shellcode_e6527…) generated by jsunpack-n contains the above seen shellcode in binary format, which makes it easier to analyse.
$ docker run --rm -v /samples:/samples -ti cincan/peepdf -i
And now we can analyse the shellcode binary with sctest:
And here we can see, that the shellcode launches an executable file.
$ docker run --rm -v /samples:/samples cincan/shellcode2exe -u /samples/input/sample-code
PDFID malware recognition test
We also ran batch scans with PDFID just to see how well it performs at recognizing malicious PDF documents. The test environment included 8999 clean and 10980 malicious documents, and here are the results:
|Likely malicious||Likely clean||Requires more analysis|
|Malware samples (10980)||10972||0||8|
|Clean samples (8999)||3649||3612||1738|
From this chart we can see, that PDFID quite perfectly identifies malware as malware: not a single malicious document is announced clean. On the other hand, only 40% of the clean files are identified as clean, but this doesn’t tell as much of the performance of PDFID, but of the complexity of malware recognition.
There are many approaches to a PDF analysis. This tutorial shows a few selected tools that are already available at the CinCan project Docker hub. All tools are downloadable at hub.docker.com/u/cincan.