A forum for reverse engineering, OS internals and malware analysis 

 #21010  by Microwave89
 Sun Sep 29, 2013 10:49 am
Hi Community!

In my old book "Rootkits: Subverting the Windows Kernel" I learned about "MigBot", which managed to copy its code in the nonpaged pool and then unloaded its driver without actually terminating its threads.

On my Windows 7 machine I then tried to reproduce this behavior although I was not able to use things like "__declspec(naked)" on VS12 x64 toolchain.

After defining my autonomous function
Code: Select all
VOID myFunction(){
    DbgPrint("Hello from NonPagedPool!");
    for(;;){
    }
}
I afterwards allocated 4096Bytes of NonpagedPoolExecute memory and copied "myFunction()" there.
The memory was given to me all the time from somewhere around address 0x4300000.

Then, after creating a new thread using PsCreateSystemThread() with nonpaged pool address a new thread was born.
But just after trying to send the hello world debug print message, i BSoD'd my system.
Hence I commented out the DbgPrint function and then tried again.
Then no BSoD came up at all, and Process Explorer showed a phantom thread was running as expected (using 12.5% of CPU) at the memory address above.

So it seems to me that the compiler generates something like a "near call" to DbgPrint function in driver code, as it expects the driver to sit in kernel space (iirc 0xFFFFF80000000000-0xFFFFFFFFFFFFFFFF) as well as the DbgPrint function does.
However, in my experiment the actual code is copied to 0x4300000, so probably I would need the compiler to produce "far calls" instead.
But since there is no way telling it this, the built in code issues a "near call" and indeed fails with an exception which leads to the blue screen mentioned above.


So for now, my question is, which approach do you propose to obtain code running without a parent driver?
Is it even possible to do so?
Maybe there is a possibility to tell the memory manager "Might you provide me with memory somewhere around 0xFFFFF8*)?"


Best Regards

Microwave
 #21011  by Vrtule
 Sun Sep 29, 2013 3:17 pm
Hello,

standard CALL instruction jumps relatively to its position, hence when you copy a code calling a function X and do not copy the function X itself (in order to preserve the difference between address of the function X and address of the CALL instruction), you get a BSOD with high probability because the CALL instruction jumps to a place where no function X is located.

Even if the call instruction looks as
Code: Select all
call [address]
you must change the address field because it probably points to Import Address Table of your driver's executable.

Executing code after unloading your driver is possible, however, you must be very careful when calling external routines and working with variables and constants stored in your executable.

I did the MigBot sort of things twice, and used different approach each time.

The first approach
I copied not only the code, but also the variables and function addresses, to (non)paged memory. My code called external functions and worked with variables through an address table. For example, your Hello World experiment should look like this:
Code: Select all
typedef struct {
  DBGPRINT *DbgPrint;
} FUNCTION_TABLE, *PFUNCTION_TABLE;

VOID NonPagedFunctionOutsideDriver(PVOID Context)
{
   PFUNCTION_TABLE table = (PFUNCTION_TABLE)Context;

   table->DbgPrint("HELLO WORLD!");

   return;
}
You just need to:
1) copy the NonPagedFunctionOutsideDriver routine to (non)paged memory,
2) copy the FUNCTION_TABLE structure to (non)paged memory and fill the address of DbgPrint correctly (DBGPRINT is a function tpye that corresponds to DbgPrint),
3) create a system thread and pass address of the FUNCTION_TABLE as an argument to its routine (NonPagedFunctionOutsideDriver).

The second approach
You can store your code in form of PE executable. Copy the executable to (non)paged memory. When you fix its import table and perform relocations, most of things should work.
-------------------------------------

Remember that on x64, RIP-relative addressing is used when working with variables and/or constants, hence the code is more position-independent than on x86.

Hope my post helps you in some way

Vrtule
 #22865  by Microwave89
 Wed May 14, 2014 3:10 pm
Hi Vrtule,

Sorry for the late answer. At the point I tried this, I really didn't understand much.
I wasn't even able to understand what is a DBGPRINT type and how to set this up properly in my code.
I did further attempts though, however I could not obtain any results exempt continuing BSODs.

While successfully attempting to do the same in userland (refer to my x64 usermode rootkit) I learned where I had been wrong.
At this point I just copy almost the entire driver image (beginning at .text) into NonPagedPoolExecute pool and launch a new thread.
And thanks to x64 relative addressing (didn't get what you told about that either...) there is not a single relocation/address change required.

Now the orphaned code each second tells me "Hello World" and an additional LoadImageNotifyRoutine (orphaned too) tells each time what images currently are being loaded.
I also managed to arbitrarily create DriverObjects, so now I will try to register my code with the I/O manager in order to intercept various (file) operations.


Best Regards

Microwave
 #22868  by Vrtule
 Wed May 14, 2014 5:32 pm
Hello,

DBGPRINT is just a type describing the DbgPrint routine of ntoskrnl.exe module.

When I am doing something I am usually trying to make it as portable as possible. I usually do not focus on a single platform (such as x64). My code works on both x86 and x64 systems. That's why I mentioned RIP-relative addressing because it makes a difference between x86 and x64. I just wanted to point out the difference, nothing more.

I am glad that you have finally succeeded.

Vrtule