Analysing Suspicious File "Outstanding Payment.jar" - Part 1
Is antivirus a 100% protection against malicious files? What techniques are used by authors of malware to avoid detection? A series of articles describes our procedure during the static analysis of a suspicious Java file and reveals interesting findings about its structure as well as about the process of analysis as such.
Not so long ago, we received a suspicious file called “Outstanding Payment.jar”. Some of the antivirus marked this file as malware, but we were interested in more details. Most of all, we wanted to find out which specific activities are executed on the infected station by this malware or as the case may be, to identify the counterparts it communicates with.
After extracting the archive, it was obvious at the first sight from the names of the individual files that this is an obfuscated Java application. The archive included:
- classes p5569.class to p5595.class,
- 3 binary files:
- n720175702
- n1436460589
- n1139827387
Naturally, decompilation was the first step using the Java Decompiler program (https://github.com/java-decompiler/jd-gui).
The names of the methods and variables were also obfuscated, naturally, resulting in the unreadable code of the application.
Layer I – Strings Obfuscation
One of the first things I focus on in such cases are text strings. It turned out that there is a number of short methods in the application and these, after a short arithmetic calculation, return probably a decoded string:
Since every method uses a different manner of calculation (different set and sequence of operations), it wasn’t possible to create a unified decoding algorithm. On the other hand, the source code of each decoding method was at our disposal. Thus, it was enough to gather all these methods, write a Java code calling them, compile, and save the resulting strings.
For these purposes, I wrote a short Python script which scanned all source .java files and found all the methods (name and return expression) with a simple regular expression
It then generated the Java program StringDecoder.java from the gathered methods, which, after it was started, decoded all strings and produced an output in the following format: “METHOD_NAME [TAB] DECODED_STRING”. Since the total number of strings exceeded a thousand, I hit the limitation for the maximum length of a method in Java. Therefore, I had to divide the decoding calls into several auxiliary blocks (methods m0-m11).
The majority of decoded strings included the names of classes and methods; therefore, we could assume that we will meet a number of reflexive calls. Beside others, there were also cryptographic methods indicating encryption:
Last, but not least, the list of strings included this line:
Thus, even in this step it was clear (or at least very probable) that this is a variant of Java malware called jRAT (Remote Administration Tool written in Java).
Because the names of decoded methods were unique across the entire application, it was possible to replace their calls by the decoded strings with yet another short Python script. However, this did not help from the functional analysis point of view – the flow of the program (control flow) was not visible mainly due to a number of short methods consisting of a few lines with meaningless names.
Layer I – Control Flow Obfuscation
So the next step was to unpack the short method calls (a process commonly performed by the compiler for inline function calls). It was to my advantage that all methods were static, so again it was enough to use several regular expressions to extract definitions and replace the calls directly with the body of the function.
It turned out that the whole application core is made of a state machine:
At the beginning of the program, the state variable set itself to the initial value, and then one calculation step was performed in each iteration of the cycle and a new value of the state variable was set.
Apparently, the originally sequenced code was transformed in a way that unique states were assigned to individual commands (or short blocks of code), and their order was subsequently reversed:
This technique is commonly used by some automated obfuscators and when looking at such code, it is difficult, practically almost impossible, to monitor the flow of the program (sequence of steps, branching, cycles ...).
Moreover, in this case, the flow of the program was not controlled by only one state variable. In addition to it, a stack was used where future states were stored and later taken out, which would make the manual step-by-step analysis very difficult.
Fortunately, it was possible to isolate the operations adjusting the state of the machine. These were the following 3 types of operations:
- assigning a numeric value to a state variable (p5595.f481768237511843 = ...),
- entering a new value to the stack
(java.util.LinkedList.getDeclaredMethod(„push“, ...)), - taking a value from the top of the stack
(java.util.LinkedList.getDeclaredMethod(„pop“, ...).
So yet again, it was enough to write a short Python script loading the whole machine and gradually interpreting the operations modifying the current state, while outputting other commands in individual states. It resulted in a much better readable sequence code, suitable for manual analysis:
Layer I – Resources Encryption
It turned out that the application uploads its 3 sources, and gradually decodes the files from them in a sophisticated way. Each resulting file is composed of several parts spread out between 3 sources – at different positions, with different sizes – and encrypted by the AES algorithm. Individual fragments are defined with strings in the following format: “size:offset:resId;size:offset:resId...”.
For a start, 3 files were decoded:
The Loader class is then used to decode more data, while a file map is read and decoded first (a serialized Java object of the java.util.Map<String, String> type):
After decoding, it is possible to read it in Java with the ObjectInputStream class:
This map contains descriptions of the fragments (size:offset:resId;size:offset:resId...) and file names, while the Loader class reads and decodes them one by one. The AES key is hard-coded in the application code and is the same for all decoded files.
Successful decoding and connecting all fragments revealed a whole lot of new files:
Unfortunately, this is probably one more layer of obfuscation – you can see a number of classes with nonsensical names.
In any case, at this stage, I was able to crack the first layer of malware. Its main task was to deliver malicious code to the target station and avoid detection.
The malicious code itself was encrypted and fragmented into 3 binary files. And for these reasons, the antivirus was not able to recognize any signature.
The envelope code used 3 primary techniques to disguise itself:
- reflexive calls
- java.lang.String.getDeclaredMethod("getBytes", new Class[0]).invoke("2173853979931072", new Object[0]);
- obfuscating text strings
- transforming the flow of the program (control flow)
I will address the analysis of the files that I managed to obtain in the second part of this article.