PDFiD/PeePDF/JSunpack-n/shellcode analysis -pipeline

This pipeline polls a gitlab repo for changes, clones new PDF samples from the repo, analyses the documents and writes logs to another branch of the repo:

Pipeline workflow:

  1. Poll for new files in git repository

  2. job-show-files: List samples to be analysed

  3. job-pdfid: Analyse the samples, classify to clean/malicious/more analysis needed, and save logs to repo. job-peepdf-virustotal-check: Analyse samples, query hashes from VirusTotal database

  4. job-jsunpack-n: Analyse samples, extract JavaScript and convert shellcode to binary if found. Save shellcode to repo's shellcode-folder.

  5. job-sctest: Use peepdf's sctest to analyse the shellcode binaries converted by jsunpack-n. Push results to repo.

The repositories

branch: master

  • Contains the scripts for the pipeline

  • The results will be written to results/

branch: pdf-source

  • Place the samples to pdf/

How to set up the pipeline


You can set up the pipeline with sudo ./setup.sh

Or directly from the pilot environment with command sudo ./setup-pipeline.sh pdf-pipeline


  1. Setup concourse (tutorial)

  2. Setup a git repository with branch:master, with the files included in the "results" folder.

  3. Setup branch:pdf-source with folder "pdf" for the samples.

  4. Edit the credentials.yml with the details of your git and your ssh key.

  5. Login to concourse:


  1. Set up the pipeline:

fly -t CONCOURSE_TARGET_NAME sp -c pipeline.yml -p pdfjobs -l credentials.yml

  1. Unpause the pipeline:

fly -t CONCOURSE_TARGET_NAME unpause-pipeline -p pdfjobs

  1. Upload your samples to pdf-source/pdf

See demo video: pdfjobs-pipeline.mp4