Carving Images for Leisure and Gain

Posted on 2021-01-26 by Josiah Smith

Throughout InQuest's research into detecting maldocs, deserving attention has been given to the graphical asset that is used as the coercive lure. From "Worm Charming", InQuest's Malware Lures Gallery, and Optical Character Recognition inspection of the instructive text to enable embedded logic, uncountable wins have been brought to the community's attention. This quick blog details a couple of approaches for acquiring maldoc images without the need to open the document and copy the image.

Some different approaches to image acquisition:

If the image's extraction is being sourced directly from a document, there are a few different tools and techniques to extract them, but the effort and effectiveness are dependent on that specific file type. In the case of .docx and .docm files, they are actually disguised zip files and the images are found within the media directory of the unzipped contents. The following bash command will loop through every docx file and extract the images to their respective and newly-created directory.

for L in *.doc*; do mkdir -p "$L"-images && unzip "$L" "word/media*" -d "$L"-images; done 

The process to extract the images is very similar for .xlsx files. The differing aspect is that they are stored in the “xl/media/” portion of the unzipped document. The following script is can automate the extraction of images from multiple files.

#!/bin/bash
for L in *
do
    mkdir -p "$L"-images
    unzip "$L" "xl/media/*" -d "$L"-images
done

While image extraction from the previous file types deemed trivial, the extraction from other file types like .doc can be a bit more challenging due to the streams associated with the Object Linking and Embedding (OLE) file format. Ogden, Roberts, and Sayre describe OLE as, “a complex binary format that is not easily manipulated”, but describe the flexibility of the compressed archives formats for manipulating the file's contents.

Another favorite tool that assists in carving images out of the documents is Foremost. Initially developed by the United States Air Force Office of Special Investigations and the Naval Post-Graduate School Center for Cybersecurity and Cyber operations, Foremost is a console program to recover files based on their data structures. While one practical use-case shown by in Persian Kitties Hiding Benign Executables is the extraction of executables embedded within images, here the carving will be images from documents. Looking at the malicious document eaee308614ca488da32275fd7f153ec5 masquerading as a FEMA grant notice, Foremost carves out 1 image.

$ foremost eaee308614ca488da32275fd7f153ec5 && md5sum output/{png,jpg}/* 2> /dev/null
Processing: eaee308614ca488da32275fd7f153ec5
|foundat=_rels/.rels ▒(▒
*|
90b376b2dfe0347ab81f5f952839a76c  output/jpg/00000013.jpg


Another approach to carving out the images found within OLE files is to use Didier Stevens’s oledump.py . However, this technique is generously more difficult as the stream containing the image is unknown at the beginning of the analysis, and on occasion, there is some data or other chaff bytes ahead of the image's magic.

$ oledump.py -d 86924f047f824dbb7663a1eaf6c0edd6dc7ad32af272e1f6cf720d264b58ae10
  1:       121 '\x01CompObj'
  2:      4096 '\x05DocumentSummaryInformation'
  3:      4096 '\x05SummaryInformation'
  4:     14433 '1Table'
  5:    125765 'Data'
  6:        97 'Macros/GSADFG354DFG/\x01CompObj'
  7:       296 'Macros/GSADFG354DFG/\x03VBFrame'
  8:       679 'Macros/GSADFG354DFG/f'
  9:       728 'Macros/GSADFG354DFG/o'
 10:       730 'Macros/PROJECT'
 11:       164 'Macros/PROJECTwm'
 12:        97 'Macros/RY546TYUww/\x01CompObj'
 13:       291 'Macros/RY546TYUww/\x03VBFrame'
 14:       131 'Macros/RY546TYUww/f'
 15:        76 'Macros/RY546TYUww/o'
 16: M    9198 'Macros/VBA/BVN456gRT'
 17: M    1342 'Macros/VBA/GSADFG354DFG'
 18: M   36117 'Macros/VBA/QWQWX6'
 19: m    1173 'Macros/VBA/RY546TYUww'
 20: M    3972 'Macros/VBA/ThisDocument'
 21:      8806 'Macros/VBA/_VBA_PROJECT'
 22:      9082 'Macros/VBA/__SRP_0'
 23:      1790 'Macros/VBA/__SRP_1'
 24:       430 'Macros/VBA/__SRP_4'
 25:      5145 'Macros/VBA/__SRP_5'
 26:      1115 'Macros/VBA/dir'
 27:      4096 'WordDocument'

Demonstrating with the file md5: 13b9709cb87044025a9fc7bc8521a028, it is apparent that there are some bytes that need to be removed before the image can be rewritten. Furthermore, while earlier analysis found the data in stream 5 of 14, it is necessary to identify the correct stream. Note the PNG header found in the E0 ofset.

$ oledump.py -a -s 5 86924f047f824dbb7663a1eaf6c0edd6dc7ad32af272e1f6cf720d264b58ae10 | more
00000000: 45 EB 01 00 44 00 64 00  00 00 00 00 00 00 08 00  E...D.d.........
00000010: 00 00 00 00 00 00 00 00  00 00 00 00 17 4A 0E 65  .............J.e
00000020: 76 02 8D 02 00 00 00 00  00 00 00 00 00 00 00 00  v...............
00000030: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
00000040: 00 00 00 00 0F 00 04 F0  50 00 00 00 B2 04 0A F0  ........P.......
00000050: 08 00 00 00 01 04 00 00  00 0A 00 00 43 00 0B F0  ............C...
00000060: 2C 00 00 00 04 41 01 00  00 00 05 C1 14 00 00 00  ,....A..........
00000070: 06 01 02 00 00 00 FF 01  00 00 08 00 64 00 73 00  ............d.s.
00000080: 62 00 75 00 66 00 66 00  65 00 72 00 32 00 00 00  b.u.f.f.e.r.2...
00000090: 00 00 10 F0 04 00 00 00  00 00 00 80 62 00 07 F0  ............b...
000000A0: A1 EA 01 00 06 06 F9 AF  C6 50 BE 2C 3F A4 23 34  .........P.,?.#4
000000B0: 38 12 1D 17 20 65 FF 00  7D EA 01 00 01 00 00 00  8... e..}.......
000000C0: 44 00 00 00 00 00 4F 00  00 6E 1E F0 75 EA 01 00  D.....O..n..u...
000000D0: F9 AF C6 50 BE 2C 3F A4  23 34 38 12 1D 17 20 65  ...P.,?.#48... e
000000E0: FF 89 50 4E 47 0D 0A 1A  0A 00 00 00 0D 49 48 44  ..PNG........IHD
000000F0: 52 00 00 09 E0 00 00 0D  78 08 06 00 00 00 F1 82  R.......x.......
00000100: A1 8A 00 00 00 01 73 52  47 42 00 AE CE 1C E9 00  ......sRGB......

Overcoming this uncertainty is simple enough with some scripting skills. The following Python script will match, carve, and write the embedded PNG image back into the current working directory.

#!/usr/bin/env python3
import re
import sys
with open(sys.argv[1], "rb") as fp:
    contents = fp.read()
    match = re.search(b"\x89\x50\x4e\x47\x0d\x0a\x1a\x0a", contents)
    if not match:
        sys.exit("No match")
    with open("output", "wb") as output:
        output.write(contents[match.start():])
        sys.exit("PNG Image Found")

There is still a problem with knowing which stream to look for the image data within, but a simple BASH loop here will write out all the streams, so they can be looped through the png image finder.

$  for L in `oledump.py -d 86924f047f824dbb7663a1eaf6c0edd6dc7ad32af272e1f6cf720d264b58ae10 | cut -d: -f1`; do oledump.py 86924f047f824dbb7663a1eaf6c0edd6dc7ad32af272e1f6cf720d264b58ae10 -d -s$L > stream$L; done

$ ls stream* | sort -n | while read L; do ./pngfind.py $L; done
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
PNG Image Found
No match
No match
No match
No match
$ file carved_image && md5sum carved_image
carved_image: PNG image data, 2528 x 3448, 8-bit/color RGBA, non-interlaced
cf6e0d01207b8dd70e37cfc66981e6df  carved_image

The resulting image was found in the stream and carved with oledump.py after removing the chaff bytes at the data stream's head. (cropped for size)

Tags
threat-hunting threat-intel in-the-wild