document pipeline

Pipeline workflow

User uploads samples to Git branch "sample-source". Concourse polls for Git, pipeline is triggered by changes in repository.

Samples are sorted to two analysis lines: PDF, and other documents. All files are scanned with Clam AV for viruses.

PDF analysis

After the virus scan, PDFs are analysed with Pdfid and Peepdf. Pdfid tries to find certain PDF keywords,
and tells if the file contains something suspicious, like JavaScript, or executes an action when opened.

Peepdf runs basic analysis as well, but it also checks if a sample's hash is found on VirusTotal and recognized as malware.
The samples that are recognized as malware, are moved to a separate folder.

Jsunpack-n unpacks JavaScript from the samples and extracts shellcode, if found, to a "shellcode" folder, and converts it to binary format.

Finally, the found shellcode is analysed with Peepdf's sctest. Sctest tries to analyse the shellcode binary and to show what it's purpose is.

Document analysis

Other documents are run through "strings" after the virus scan.

After strings, Oledump dumps data streams found in the samples.

The last job, Olevba, parses the OLE/OpenXML files to detect macros and extract their source code.
Olevba also detects patterns, like auto-executable macros, VBA keywords, anti-sandboxing and anti-virtualization
techniques and IOCs. It can also decode obfuscation methods like Hex encoding, StrReverse, Base64 and Dridex.

Results

All jobs create their own logs, that can be viewed in the "results" branch.


Job descriptions

job-sort-files

Files are sorted by type to PDF or non-PDF.
output: filetypes.log

job-clamscan

All samples are scanned with ClamAV.
output: clamscan.log
triggered after: job-sort-files

job-strings

Samples run through "strings".
output: strings.log
triggered after: job-sort-files

job-pdfid

Analyses PDF files.
output: pdfid.log, pdfid-malicious.log/pdfid-clean.log
triggered after: job-clamscan

job-peepdf-virustotal-check

Analyses PDF files. If a sample's hash is found on Virustotal, the sample is moved to folder "virustotal".
output: peepdf.log, peepdf-suspicious.log/peepdf-clean.log
triggered after: job-clamscan

job-jsunpackn

Analyses PDF files. Possible shellcode is extracted to "results/shellcode".
output: jsunpackn.log, results/shellcode -files
triggered after: job-pdfid & job-peepdf-virustotal-check

job-sctest

Analyses shellcode from "results/shellcode" extracted by jsunpackn.
output: sctest.log
triggered after: job-jsunpackn

job-oledump

Dumps data streams of OLE files (doc, xls, ppt...).
output: oledump.log
triggered after: job-strings

job-olevba

Parses OLE and OpenXML files to detect VBA macros and extract their source code.
output: olevba.log
triggered after: job-oledump