String Encoding and YARA... Oh My
On December 16th, 2020 Twitter user Insomnihack @pro_integritate posted an interesting obfuscated document here , where it was flagged as Dridex in some sandboxes.
This sample threw an error and would not open in Office 2010 until I changed the file extension to “doc’.
The thing that stood out the most on initial inspection is the massive use of the properties “wd.. “ like “wdArtWeavingStrips” each of these properties map to constant values of “Word Enumerated Constants” found here on the Microsoft developer page.
On December 22nd, InQuest had a Twitter post with another sample here with the same formatting.
After extracting all of the VBA and combining them all into one file and then highlighting the “wd’ we can see there are many different values.
So I created a tool to extract the values from the combined sheet.
Out of curiosity, I was wondered how the two files compared and were they using the same values.
After fixing a bug in my diff tool, we can see many differences between the two lists.
If we compare the two lists, we see.
List 1 is the list from Insomnihack, and list 2 is from the InQuest file. Looking at the diff, they only share 7 values.
Looking at the reverse diff.
It still only shows 7 shared values.
We can combine the two lists to get a larger sample to search for and build a yara rule.
So thanks to Josiah Smith @JosiahSecurity, we can turn a list of strings into a Yara rule using the “novel_rule_generator.py “ located here.
Looking at several other samples and doing a diff on them, we can see the list keeps getting bigger. Some values did not show up on the list in the first link. So we need to get the full list.
It appears that later versions of Office added more new constant/value pairs.
With some searching around we find the open-source for Office located here.
Tying to look online and find the values will truncate the list, so we have to download the full source code and search for the values.
Searching for the strings of interest in the downloaded code will lead us to the path of “VBA-Docs-master\api\Word.Wd[name].md” . So go to the path “VBA-Docs-master\api” then scroll down the list until you get to the file like “Word.WdAlertLevel.md”.
Looking in the first file, we see the name, value, description separated by the “pipe” symbol.
Looking at the files with the “wd” there were 300 separate files, so I extracted them to a different folder and built a parser for the files to extract just the name and value and write to a file.
If we look at the list, we have 3,173 name/value pairs.
So to have a proper search, we need to use a possible subset of all of them.
The first try at a yara rule only searches for a small amount of these but still retuned over 1,000 results, all positive for what we are searching for but not all of them were of the type we want to narrow down to.
So we look for at least 6 of these in a file.
What else can we add to the rule to make it more effective?
If we use a hex editor and scroll near the end of the document, we see it has an embedded object — a stylesheet.
So now we can extract some strings from this and build a new Yara rule.
With the full list of possible “wd” values and now the specific values found in the stylesheet, we should significantly reduce the possibilities of false positives for what we are searching for
We can also use the list of name/value to do replacements in the code then do the math.
Now let’s go back and take a look at how the values are used in the functions.
Here we can see there are a series of functions. These functions will get called and decode the strings for use in the main function.
Let’s take a closer look at how this works.
It will use a Series of math functions to build each letter then assemble them into a string.
The first thing we want to do is remove the # symbol for readability since notepad++ is getting confused as to what they are.
Next, we will get the replacements.
While going thru the Constant names, we find an odd one, “vbInformation” which the values can be found here.
Since I did not take the time to build a tool to do the replacements for me, I have to do them one at a time.
Now that the replacement for the Constant value are done here is what it looks like and the list used.
The next step is to do the math. It is a pain and error-prone to do it by hand, so let’s build a quick program and let it do the math for us.
We copy & paste the replacement code into VS and go.
This is a long-winded way to build a string, but it will be annoying enough to frustrate quite a few analysts.
After spending way too much time decoding these by hand, we find that it will extract the stylesheet and write it to a file, then run the code to download the next stage.
All of this trouble to hide the values this way may make it difficult to analyze but it will stand out like a red flag. This is very specific for this type of “encoding’.
All of this to download from urls for the next stage.
That is it for this time.
Link to Josiah Smith @JosiahSecurity “novel_rule_generator.py”
Link to my GitHub with the various incarnations of the yara rule, the list of constants only and the list of constant/values. I’m also adding the 1 page where the constants were replaced in the screenshot.
I’ve also Included the extracted stylesheet.