TL;DR: Through dynamic loading, malware authors can covertly load malicious code into their application in order to avoid detection. We can detect such loading through the application’s /proc/[PID]/maps kernel generated file.
Recently, we created a simple script that allows us to detect dynamic loading in Android apps. This presented us with a good opportunity to discuss dynamic loading in general in this blog.
Dynamic Loading and Linking:
In order to understand the rest of the blog, it is important to go over the basics of linking and loading, and highlight the differences between linking vs. loading, and static vs. dynamic.
What Is Linking:
The process of compilation has multiple parts. Linking is the last step before we get a runnable executable. A program is usually more than a simple self-contained file, and relies on other libraries or files to operate. As such, it’s not enough for you to compile your code into machine-code to be able to run it, but also to somehow link the different files into a cohesive executable.
How is this done in practice? After compilation and assembly, the assembler outputs object files, which usually correspond to each module in your program. Such files can be either relocatable or executable. Object files of the first variety have to undergo linking before they can be run but ones of the second type can be executed immediately. In the GNU/Linux ecosystem, the usual extension for an object file is .o. Shared object files, known as libraries, are relocatable object files that are intended to be used by many different programs, and can’t be run on their own. In GNU/Linux, they have the extension .so (Shared Object), whereas in Windows they have the .dll (Dynamic-link Library) extension.
The linker then takes the files, resolves the symbols (functions and variables), and points them to the correct memory addresses by writing everything down in the symbol table of the executable. For the sake of performance, it also relocates your code so that related pieces of code end up mapped to nearby memory addresses, regardless of how you originally organized your program, as human readability isn’t an issue anymore, and performance is the main priority.
The Many Different Names of the Executable:
In both Linux and Windows those seemingly different extensions have the same underlying formats as executables. In Windows, they are PEs (Portable Executable), and usually have the extensions .exe and .dll, and in Linux, they are ELFs (Executable and Linkable File), and have the extensions .bin, .so or .o (Yes, the object files we mentioned earlier are also ELFs).
We say that something has been dynamically linked when instead of doing that process during compile-time, it is done in load-time or runtime. Which brings us to loading: a loader simply copies the contents of the linker’s output to memory, and runs the program. When dynamic linking is used, the linking process occurs just before loading, when running the program, which often leads to confusion between the two.
Dynamic loading, on the other hand, means that parts of the code can be loaded to memory at any point during runtime. The two processes can and frequently are done together.
So Why Is This a Problem?
Dynamic loading is certainly useful. For example, rather than loading all libraries your program will use at load-time, you could load them up only when you need to use them, thus using less memory, or only conditionally load them in certain cases.
But it also presents an easy way for malware developers to hide their malicious code. They can put all of their legitimate code in the APK, and move all of the nefarious code into a DEX (Dalvik Executable) that the application will download and then dynamically load during use and thus make their application appear innocuous upon basic static inspection of the APK.
How are DEX Classes Loaded:
Android offers an option for dynamically loading .dex files using a class called DexClassLoader. To load a class, we simply need to write:
// Init the loader DexClassLoader dexClassLoader = new DexClassLoader(path_to_dex, null, null, parent_class); // Load the class: Class dynamic_class = dexClassLoader.loadClass("DynamicClass"); // Load a method we could call it Method method = dynamic_class.getMethod("method1");
And then we can use the method’s invoke() to use the method.
The /proc/[PID]/maps File:
In Unix, everything is a file, and even if it isn’t really one, it is handled and accessed like one. This includes the Kernel’s data structures, and Linux is no exception to the rule. The Linux Kernel allows us to access and read its data structures through the /proc/ pseudo-file system. Each process then has its own folder at /proc/[PID]. The files and subfolders here hold plenty of useful and important information about the process, but today we will focus on just one file: /proc/[PID]/maps.
/proc/[PID]/maps displays a chart of the mapped memory of a process. When we say mapped memory, we mean a virtual memory segment that has a one to one correspondence with a file. This mapping enables an application to modify and access files by reading and writing directly to memory. This means that when a program accesses a file, this will end up being recorded in its /proc/[PID]/maps file.
/proc/[PID]/maps also shows us what permissions the process has for each segment. This can help us determine which files the process edited and which files it has read.
Here is how a short segment of a normal /proc/PID/maps file looks:
7f9cefbf7000-7f9cefbf8000 r--p 00000000 103:03 1589169 /usr/lib/libXcomposite.so.1.0.0 7f9cefbf8000-7f9cefbf9000 r-xp 00001000 103:03 1589169 /usr/lib/libXcomposite.so.1.0.0 7f9cefbf9000-7f9cefbfa000 r--p 00002000 103:03 1589169 /usr/lib/libXcomposite.so.1.0.0 7f9cefbfa000-7f9cefbfb000 r--p 00002000 103:03 1589169 /usr/lib/libXcomposite.so.1.0.0 7f9cefbfb000-7f9cefbfc000 rw-p 00003000 103:03 1589169 /usr/lib/libXcomposite.so.1.0.0 7f9cefbfc000-7f9cefc08000 r--p 00000000 103:03 1579223 /usr/lib/libxcb.so.1.1.0 7f9cefc08000-7f9cefc1b000 r-xp 0000c000 103:03 1579223 /usr/lib/libxcb.so.1.1.0 7f9cefc1b000-7f9cefc24000 r--p 0001f000 103:03 1579223 /usr/lib/libxcb.so.1.1.0 7f9cefc24000-7f9cefc25000 r--p 00027000 103:03 1579223 /usr/lib/libxcb.so.1.1.0 7f9cefc25000-7f9cefc26000 rw-p 00028000 103:03 1579223 /usr/lib/libxcb.so.1.1.0 7f9cefc26000-7f9cefc27000 r--p 00000000 103:03 1577111 /usr/lib/libX11-xcb.so.1.0.0 7f9cefc27000-7f9cefc28000 r-xp 00001000 103:03 1577111 /usr/lib/libX11-xcb.so.1.0.0 7f9cefc28000-7f9cefc29000 r--p 00002000 103:03 1577111 /usr/lib/libX11-xcb.so.1.0.0 7f9cefc29000-7f9cefc2a000 r--p 00002000 103:03 1577111 /usr/lib/libX11-xcb.so.1.0.0 7f9cefc2a000-7f9cefc2b000 rw-p 00003000 103:03 1577111 /usr/lib/libX11-xcb.so.1.0.0 7f9cefc2b000-7f9cefc47000 r--p 00000000 103:03 1584005 /usr/lib/libX11.so.6.3.0 7f9cefc47000-7f9cefcd1000 r-xp 0001c000 103:03 1584005 /usr/lib/libX11.so.6.3.0
Each row in the file records a single memory segment in the contiguous virtual memory address space allocated to the process.
- Address – The first column shows the starting and ending address of the segment.
- Permissions – This column shows which permissions the process has for the segment. r/w/x are the usual read/write/execute, while the last letter is s or p, meaning shared or private, respectively.
- Offset – This is the offset from the beginning of the file, in order to be able to calculate the starting address of the mapped data. Sometimes, a segment isn’t mapped from a file (in which case the path column will have an identifier for the nature of the segment, as explained below), in which case the offset is simply left as 0.
- Device – When the segment was mapped from a file, the hex number of the device where the file is stored is displayed in this column.
- Inode (Index Node) – If the segment came from a file, this is the inode number of the file.
- Path – This is the path for the file, if there is one. This can be [heap], [stack] or [vsdo] if the segment is the eponymous structure.
Our Humble Script:
In offending android applications, dynamic loading of code is usually done from the home directory of the application, so files loaded to memory from that dictionary (or other common locations) ought to appear in the maps file. To automate the task of checking that file, we constructed a very simple script that uses regex to search for the string ‘/data/data’ in the maps file of a given PID in the connected device and returns the lines that match. /data/data is of course the apps home directory where files and data are stored. It is the only location from which an app can load DEX files.
Interacting with ADB through Python was sometimes a challenge, as doing so using the shell can be awkward, but we found a useful set of tools called pwntools that provide exactly that functionality. They offer a vast set of functionalities designed to help with hacking and prototyping. Definitely check it out.
You can see the full script we use at this Github gist.