Document pipeline
The pipeline clones samples from a Gitlab repo, sorts files to PDF and other documents and then runs appropriate tools to the sample files. * Watch the VIDEO
Tools run in the pipeline
ClamAV/PDFiD/PeePDF/JSunpack-n/shellcode/strings/oledump/olevba
Pipeline workflow
User uploads samples to Git branch "sample-source". Concourse polls for Git, pipeline is triggered by changes in repository.
Samples are sorted to two analysis lines: PDF, and other documents. All files are scanned with Clam AV for viruses.
The Document pipeline at Gitlab
PDF analysis
After the virus scan, PDFs are analysed with Pdfid and Peepdf. Pdfid tries to find certain PDF keywords, and tells if the file contains something suspicious, like JavaScript, or executes an action when opened.
Peepdf runs basic analysis as well, but it also checks if a sample's hash is found on VirusTotal and recognized as malware.
Jsunpack-n unpacks JavaScript from the samples and extracts shellcode, if found, to a "shellcode" folder, and converts it to binary format.
Finally, the found shellcode is analysed with Peepdf's sctest. Sctest tries to analyse the shellcode binary and to show what it's purpose is.
Document analysis
Other documents are run through "strings" after the virus scan.
After strings, Oledump dumps data streams found in the samples.
The last job, Olevba, parses the OLE/OpenXML files to detect macros and extract their source code. Olevba also detects patterns, like auto-executable macros, VBA keywords, anti-sandboxing and anti-virtualization
techniques and IOCs. It can also decode obfuscation methods like Hex encoding, StrReverse, Base64 and Dridex.
Results
All jobs create their own logs, and the final job creates a summary report. These can be viewed in the "results" branch. An example of a summary report here.
Setting up the pipeline
The easiest way to set up the pipeline is using the pilot environment instructions.
Job descriptions
job-sort-files
Files are sorted by type to PDF or non-PDF.
job-clamscan
All samples are scanned with ClamAV.
triggered after: job-sort-files
job-strings
Samples run through "strings".
triggered after: job-sort-files
job-pdfid
Analyses PDF files.
triggered after: job-clamscan
job-peepdf-virustotal-check
Analyses PDF files. Checks if the file's hash is found on Virustotal.
triggered after: job-clamscan
job-jsunpackn
Analyses PDF files. Possible shellcode is extracted to "results/shellcode".
triggered after: job-pdfid & job-peepdf-virustotal-check
job-sctest
Analyses shellcode from "results/shellcode" extracted by jsunpackn.
triggered after: job-jsunpackn
job-oledump
Dumps data streams of OLE files (doc, xls, ppt...).
triggered after: job-strings
job-olevba
Parses OLE and OpenXML files to detect VBA macros and extract their source code.
triggered after: job-oledump
job-generate-report
Creates the final report to the "results" branch.