@CrazySkull,
Here comes my semi-novelistic response...
I'm doing everything, and more (added safety), this package you speak of is doing minus the stupid and comical extra distributed "thunking" DLL dependency. Anyhow, there are many logical/safety-concerning things that these homemade "BSOD kits" aren't doing to create safe passage of DLL injection from ring0. Small things need be paid attention to for working correctly such as injection into say Windows protected processes (i.e> AudioDg.exe) For example, no idea if this applies or not to your code but assume there's a usermode portion which does the actual DLL [un]injection. How are you to uninject your DLL if you're using ring3 code calling OpenProcess() on the target process Id in this scenario since you lack the target process handle access rights to remove your low-level injected DLL from usermode resulting in a DLL that cannot be unloaded, all because your driver injected a DLL into a Windows protected process? Now, if you're using a driver to both inject and uninject then you can disregard that type of scenario of course. Injection into Windows protected processes does in fact ruin their integrity as well, since any loaded module has a hash check performed on it and compared against a catalog. Another reason why you'd probably want to restrict injection into them from within a driver.
With 32-bit and 64-bit processes on x64 you can still use APCs seamlessly with a little bit of effort, there's just the wrapper Wow64ApcRoutine in your way which can easily be pacified if you know what you're doing in a driver. There are other caveats as well in x64 platform, here are a few more heart-warming examples that I project your way for learning purposes ;) You can safely allocate memory in the target process from inside a LoadImageNotifyRoutine if you're currently processing the mapped-in image NTDLL, if you're waiting for KERNEL32 to load you're no longer "safely" able to allocate memory directly within the callback, resulting in an unforgiving process deadlock, and for good reason, being that an image has already been mapped and your callback instance called in order (procedurally), meaning a lock is still being held on the previously allocated memory which has been committed and you're waiting for the callback to finish completion, which never happens since you've just now deadlocked inside of said callback while trying to allocate more memory. These kernel image/process/thread callbacks aren't asynchronous, so keep this in mind.
Now, if you study ugly and problematic code such as TDSS/TDLx you'll see that their authors use workitems in order to function out-of-context to the executing LoadImageCallback... smart move, but still ugly and I would suggest a kernel mode APC to be queued to the current thread which then invokes a usermode APC to be delivered since it's clean and always executes in the same target thread context. This is exactly what I am doing with my commercial package "nameless" - since I'd never want to encourage self-promotion ;) Also, please keep in mind that if you choose to wait for KERNEL32 to be mapped into WOW64/32-bit processes this can be a little strange since SysWOW64\KERNEL32 is actually mapped/unmapped several times during process initialization. I'd recommend that you use InterlockedExchange on the target process ID per KERNEL32 load instance to solve this issue.
I would suggest that you stay away from these kernel mode dependency-driven usermode "thunking layer" DLL examples, aka shitty rootkit/malware crash-prone public sources and even _some_ commercial software solutions for kernel mode injection like a certain popular commercial package used for DLL injection (which by the way, actually has to HOOK NtTestAlert to achieve DLL injection before the main program executes its entrypoint, since NtTestAlert is called before the program runs in order to flush the target thread's APC queue). The Windows landfill is already full of BSOD garbage, who needs to hook function code in order to inject a DLL anyhow?!!! :lol:
Best Regards,
Brock
Here comes my semi-novelistic response...
I'm doing everything, and more (added safety), this package you speak of is doing minus the stupid and comical extra distributed "thunking" DLL dependency. Anyhow, there are many logical/safety-concerning things that these homemade "BSOD kits" aren't doing to create safe passage of DLL injection from ring0. Small things need be paid attention to for working correctly such as injection into say Windows protected processes (i.e> AudioDg.exe) For example, no idea if this applies or not to your code but assume there's a usermode portion which does the actual DLL [un]injection. How are you to uninject your DLL if you're using ring3 code calling OpenProcess() on the target process Id in this scenario since you lack the target process handle access rights to remove your low-level injected DLL from usermode resulting in a DLL that cannot be unloaded, all because your driver injected a DLL into a Windows protected process? Now, if you're using a driver to both inject and uninject then you can disregard that type of scenario of course. Injection into Windows protected processes does in fact ruin their integrity as well, since any loaded module has a hash check performed on it and compared against a catalog. Another reason why you'd probably want to restrict injection into them from within a driver.
With 32-bit and 64-bit processes on x64 you can still use APCs seamlessly with a little bit of effort, there's just the wrapper Wow64ApcRoutine in your way which can easily be pacified if you know what you're doing in a driver. There are other caveats as well in x64 platform, here are a few more heart-warming examples that I project your way for learning purposes ;) You can safely allocate memory in the target process from inside a LoadImageNotifyRoutine if you're currently processing the mapped-in image NTDLL, if you're waiting for KERNEL32 to load you're no longer "safely" able to allocate memory directly within the callback, resulting in an unforgiving process deadlock, and for good reason, being that an image has already been mapped and your callback instance called in order (procedurally), meaning a lock is still being held on the previously allocated memory which has been committed and you're waiting for the callback to finish completion, which never happens since you've just now deadlocked inside of said callback while trying to allocate more memory. These kernel image/process/thread callbacks aren't asynchronous, so keep this in mind.
Now, if you study ugly and problematic code such as TDSS/TDLx you'll see that their authors use workitems in order to function out-of-context to the executing LoadImageCallback... smart move, but still ugly and I would suggest a kernel mode APC to be queued to the current thread which then invokes a usermode APC to be delivered since it's clean and always executes in the same target thread context. This is exactly what I am doing with my commercial package "nameless" - since I'd never want to encourage self-promotion ;) Also, please keep in mind that if you choose to wait for KERNEL32 to be mapped into WOW64/32-bit processes this can be a little strange since SysWOW64\KERNEL32 is actually mapped/unmapped several times during process initialization. I'd recommend that you use InterlockedExchange on the target process ID per KERNEL32 load instance to solve this issue.
Code: Select all
If you only wait for NTDLL you'll notice that WOW64 processes map in a native 64-bit NTDLL as well as a 32-bit NTDLL, you can still safely allocate memory during any of these two loads inside the LoadImageNotifyRoutine callback but I'd advise waiting for KERNEL32.DLL to be mapped after, since you can access a ton of functions directly from the mapped image's export table while inside your kernel callback, plus all KERNEL32!LoadLibraryX variants will fit perfectly inside the argument confines of KeInitialiazeApc and KeInsertQueueApc, unlike NTDLL!LdrLoadDll which requires 4 arguments and create extra work for yourself. ASLR is never an issue with DLLs such as NTDLL and KERNEL32.dll since the imagebase may change after rebooting but the load virtual address is still shared across process boundaries and will always be the same in each process of the same bitdepth. It all comes down to retrieving proper 32-bit and 64-bit pointers to ntdll!LdrLoadDll or kernel32!LoadLibraryExW etc. and deciphering the target process bitdepth at the time of queuing your APC. With Wow64Apcroutine, you'll want to split your CONTEXT argument in half, like this
/* global var */
LONG gProcessId = 0;
/* local var inside callback */
LONG cProcessId = (LONG)PsGetCurrentProcessId();
if (InterlockedExchange(&gProcessId, cProcessId) == cProcessId)
return;
/* Already queued the APC for injection to this target after locating kernel32.dll */
Code: Select all
Since you'll want to free your DLLName memory, and waiting on an APC to be delivered isn't a trivial matter at all and require custom notifications to be triggered (like an event), you could check if KeInsertQueueApc succeeds and then followup with it by queuing a second APC directly behind it to free your previously allocated memory. This works perfectly since APCs are delivered in the order in which they arrive via the FIFO method (First In - First Out). If APC #1 uses memory allocated to load the DLL then APC #2 could call (i.e> ZwFreeVirtualMemory) which would point to the original allocated memory region, satisfying no memory leaks and would be delivered immediately after your first APC is processed and your DLL is loaded. This is a wait-free memory deallocation solution and of course better than leaking memory which most public examples do today, sadly. /* Always 8 bytes wide on x64 */
ULONG_PTR Ctx;
if (Process32Bit)
{
((PULONG)&Ctx)[0] = (ULONG)DLL;
((PULONG)&Ctx)[1] = (ULONG)LOADFUNC;
}
else
....
{
/* 64-bit process
Queue APC normally
*/
}
I would suggest that you stay away from these kernel mode dependency-driven usermode "thunking layer" DLL examples, aka shitty rootkit/malware crash-prone public sources and even _some_ commercial software solutions for kernel mode injection like a certain popular commercial package used for DLL injection (which by the way, actually has to HOOK NtTestAlert to achieve DLL injection before the main program executes its entrypoint, since NtTestAlert is called before the program runs in order to flush the target thread's APC queue). The Windows landfill is already full of BSOD garbage, who needs to hook function code in order to inject a DLL anyhow?!!! :lol:
Best Regards,
Brock
Accept nothing less than STATUS_SUCCESS