This blog post continues our Script Series where the FireEye Labs Advanced Reverse Engineering (FLARE) team shares tools to aid the malware analysis community. Today, we releaseironstrings: a new IDAPython script to recover stackstrings from malware. The script leverages code emulation to overcome this common string obfuscation technique. More precisely, it makes use of ourflare-emutool, which combines IDA Pro and the Unicorn emulation engine. In this blog post, I explain how our new script usesflare-emuto recover stackstrings from malware. In addition, I discussflare-emus event hooks and how you can use them to easily adapt the tool to your own analysis needs.
Analyzing strings in binary files is an important part of malware analysis. Although simple, this reverse engineering technique can provide valuable information about a programs use and its capabilities. This includes indicators of compromise like file paths and domain names. Especially during advanced analysis, strings are essential to understand a disassembled programs functionality. Malware authors know this, and string obfuscation is one of the most common anti-analysis techniques reverse engineers encounter.
Due to the prevalence of obfuscated strings, the FLARE team has already developed and shared various tools and techniques to deal with them. In 2014 we published an IDA Pro plugin toautomate the recovery of constructed strings in malware. In 2016, we releasedFLOSS; a standaloneopen-source toolto automatically identify and decode strings in malware.
Both solutions rely onvivisect, a Python-based program analysis and emulation framework. Although vivisect is a robust tool, it may fail to completely analyze an executable file or emulate its code correctly. And just like any tool, vivisect is susceptible to anti-analysis techniques. With missing, incomplete, or erroneous processing by vivisect, dependent tools cannot provide the best results. Moreover, vivisect does not provide an easy-to-use graphical interface to interactively change and enhance program analysis.
I encountered all these shortcomings recently, when I analyzed a GandCrab ransomware sample (version 5.0.4, SHA256 hash: 72CB1061A10353051DA6241343A7479F73CB81044019EC9A9DB72C41D3B3A2C7). The malware contains various anti-analysis techniques to hinder disassembly and control-flow analysis. Before I could perform any efficient reverse engineering in IDA Pro, I had to overcome these hurdles. I used IDAPython to remove various anti-analysis instruction patterns which then allowed the disassembler to successfully identify all functions in the binary. Many of the recovered functions contained obfuscated strings. Unfortunately, my changes did not propagate to vivisect, because it performs its own independent analysis on the original binary. Consequently, vivisect still failed to recognize most functions correctly and I couldnt use one of our existing solutions to recover the obfuscated strings.
While I could have tried to feed my patches in IDA Pro back to vivisect or to create a modified binary, I instead created a new IDAPython script that does not depend on vivisect. Thus, circumventing the mentioned shortcomings. It uses IDA Pros program analysis and Unicorns emulation engine. The easy integration of these two tools is powered byflare-emu.
Using IDA Pro instead of vivisect resolves multiple limitations of our previous implementations. Now changes that users make in their IDB file, e.g. by patching instructions to manually enhance analysis, are immediately available during emulation. Moreover, the tool more robustly supports different architectures including x86, AMD64, and ARM.
Stackstrings: An Example
The disassembly listing in Figure 1 shows an example string obfuscation from the sample I analyzed. The malware creates a string at run-time by moving each character into adjacent stack addresses (gray highlights). Finally, the sample passes the strings starting offset as an argument to the InternetOpen API call (blue highlight). Manually following these memory moves and restoring strings by hand is a very cumbersome process. Especially if malware complicates value assignments using additional instructions like illustrated below.
Figure 1: Disassembly listing showing stackstring creation and usage
Because malware often uses stack memory to create such strings, Jay Smithcoined the term stackstringsfor this anti-analysis technique. Note that malware can also construct strings in global memory. Our new script handles both cases; strings constructed on the stack and in global memory.
ironstrings: Stackstring Recovery Using flare-emu
The new IDAPython script is an evolution of our existing solutions. It combines FLOSSs stackstring recovery algorithm and functionality from our IDA Pro plugin. The script relies on IDA Pros program analysis and emulates code using Unicorn. The combination of both tools is powered byflare-emu.Fe, short forflare-emu, is the chemical symbol for iron and hence the script is namedironstrings.
To recover stackstrings,ironstringsenumerates all disassembled functions in a program except for library and thunk functions as identified by IDA Pro. For each function, the script emulates various code paths through the function and searches for stackstrings based on two heuristics:
- Before all call instructions in the function. As stackstrings are often constructed and then passed to other functions, i.e. Windows APIs like CreateFile or InternetOpenUrl.
- At the end of a basic block containing more than five memory writes. The number of memory writes is configurable. This heuristic is helpful if the same memory buffer is used multiple times in a function and if the string construction spans multiple basic blocks.
If any of these conditions apply, the script searches the functions current stack frame for printable ASCII and UTF-16 strings. To detect strings in global memory, the script additionally searches for strings in all memory locations that have been written to.
Using flare-emu Hooks to Recover Stackstrings
If youre not already familiar withflare-emu, I recommend reading ourprevious blog post. It discusses some of the interfaces the tool provides. Other helpful resources are the examples and the project documentation available on theflare-emu GitHub.
The stackstrings script usesflare-emusiterateAllPathsAPI. The function iterates multiple code paths through a function. It first finds possible paths from function start to function end. The tool then forces the emulation down all identified code paths independent from the actual program state. This extensive code coverage allowsironstringsto recover strings constructed from many different emulation runs.
A key feature offlare-emuare the various hook functions that get triggered by different emulation events. These hooks, or callbacks, enable the development of very powerful automation tasks. The available hooks are a combination of Unicorns standard hooks, e.g., to hook memory access events, and multiple convenience hooks provided byflare-emu. The following section briefly describes the available callbacks inflare-emuand illustrates how theironstringsscript uses them to recover obfuscated strings.
- instructionHook: This Unicorn standard hook is triggered before an instruction is emulated. ironstrings uses this hook to initiate the stackstrings extraction if a basic block contained enough memory writes, for example.
- memAccessHook: This Unicorn standard hook is triggered when memory read or write events occur during emulation. In the stackstrings script this function stores data about all memory writes.
- callHook: Thisflare-emuhook is activated before each function call. The hooks return value is ignored. In the stackstrings script this hook triggers the extraction of stackstrings.
- preEmuCallback: Thisflare-emuhook is called before each emulation run. It is only available in the iterate and the iterateAllPaths functions. The hooks return value is ignored.ironstringsdoes not use this hook.
- targetCallback: Thisflare-emuhook gets called whenever one of the specified target addresses is hit. It is only available in the iterate and the iterateAllPaths functions. The hooks return value is ignored. The stackstrings script does not use this hook.
The code in Figure 2 shows the callback functions thatflare-emus API currently supports, their signatures, and examples of how to use them. All callbacks receive an argument named hookData. This named dictionary allows the user to provide application specific data to use before, during, and after emulation. Often, this dictionary is named userData in the user-defined callbacks, as in the examples below, due to its naming in Unicorn.ironstringsuses this to access function analysis data and store recovered strings across its various hooks. The dictionary also provides access to theEmuHelperobject and emulation meta data.
Figure 2: flare-emu example hook implementations
Note that bothflare-emuandironstringswere written using the new IDAPython API available in IDA Pro 7.0 and higher. They are not backwards compatible with previous program versions.
Usage and Options
To run the script in IDA Pro, go to File Script File... (ALT+F7) and selectironstrings.py. The script runs automatically on all functions, prints its results to IDA Pro's output window, and adds comments at the locations where it recovered stackstrings. Figure 3 shows the scripts output of the recovered stackstring locations from the GandCrab sample. Analysis of this malware takes the script about 15 seconds.
Figure 3: Deobfuscated stackstrings and locations where they were identified
Figure 4 shows the disassembly listing of the stackstring creation example discussed at the beginning of this post after runningironstrings.
Figure 4: Commented stackstring after running ironstrings
After analyzing a sample, the script provides a summary and a unique listing of all recovered strings. The output for the ransomware sample is shown in Figure 5. Here the tool failed to analyze two functions due to invalid memory operations during Unicorns code emulation.
Figure 5: Script summary and unique string listing
Note that you can modify various options to change the scripts behavior. For example, you can configure the output format at the top of theironstrings.pyfile. The scriptsREADMEfile explains the options in more detail.
This blog post explains how our new IDAPython scriptironstringsworks and how you can use it to automatically recover stackstrings in IDA Pro. Overcoming anti-analysis techniques is just one of many useful applications of code emulation for malware analysis. This post shows thatflare-emuprovides the ideal base for this by integrating IDA Pro and Unicorn. The detailed discussion offlare-emus hook functions will help you to write your own powerful automation scripts. Please reach out to us with questions, suggestions and feedback via theflare-emuandflare-idaGitHub issue trackers.