Воробьёвы

(-:

Klister 0.4 by Joanna Rutkowska is a nice piece of code that enumerates existing threads by walking KiWaitInListHead, KiWaitOutListHead and KiDispatcherReadyListHead array. It was designed specially for w2k and doesn't handle any other OS. Joanna makes an assumption that every thread, either waiting or ready for execution, must belong to one of these lists. However, we all remember Joanna's mistake in klister 0.3 - she thought that every process in the OS must have unique process identifier. So, is Joanna now right when she says that "it is rather impossible to remove threads of the hidden process from the just mentioned dispatcher internal lists, because then, our hidden process, will not get any CPU time for execution" [1]? The answer is no. It is possible to remove threads from those lists and to run our own thread scheduler that will schedule only our hidden threads.

See how I've done this in the PHIDE2 engine.



1. Introduction
2. A few words about how the NT scheduler works
3. Code pullout
3.1. Engine info
3.2. Usage
4. Phide2
4.1. What is necessary to run a new scheduler
4.2. How to find needed non-exported symbols
4.2.1. NtYieldExecution
4.2.2. KiWait[In/Out]ListHead, KeDispatcherReadyListHead, KiReadySummary, KiStackOutSwapRequest
4.2.3. KeBalanceSetManager, KeSwapProcessOrStack
4.2.4. KiStackInSwapListHead, KiProcess[In/Out]SwapListHead, KiSwapEvent
4.2.5. KiReadyQueueIndex
4.2.6. PsActiveProcessHead
4.3. Usage
5. Conclusion
6. Greets
7. Related links



2. A few words about how the NT scheduler works
-----------------------------------------------


Near to the end of the phase 1 initialization MmInitSystem starts two threads: KeBalanceSetManager and KeSwapProcessOrStack. This is the balance set manager. KeSwapProcessOrStack starts an endless loop in which it process swap events. Swap events are signaled by KiSwapEvent. There are 4 types of them:

- outswap kernel stacks (specified by BOOLEAN KiStackOutSwapRequest);
- outswap processes (processes that need outswapping are stored in the KiProcessOutSwapListHead);
- inswap processes (processes that need inswapping are stored in the KiProcessInSwapListHead);
- inswap kernel stacks (threads that need inswapping are stored in the KiStackInSwapListHead).

KeBalanceSetManager also loops forever and waits for a MmWorkingSetManagerEvent (to trim working sets when memory is low) and for a timer. Timer event handler periodically sets KiStackOutSwapRequest to TRUE and signals the KiSwapEvent to inform KeSwapProcessOrStack that it has to outswap kernel stacks of threads that were waiting something for a long time. KeBalanceSetManager also calls KiScanReadyQueues that boosts prioriries of the threads in the ready queues (KiDispatcherReadyListHead array). For each boosted thread KiReadyThread is called, so it is possible that PRCB.NextThread is set to the boosted thread immediately (KiReadyThread will preempt original NextThread).

KeUpdateSystemTime is called directly from hal by the timer interrupt handler. It then calls KeUpdateRunTime "to update the runtime of the current thread, update the runtime of the current thread's process, and decrement the current thread's quantum". When KeUpdateRunTime realizes that current thread is not the Idle thread and its quantum is over, it requests quantum end by firing a dispatch interrupt. KiDispatchInterrupt checks if quantum end was requested and PRCB.NextThread has been already chosen. If so, it sets PRCB.CurrentThread to PRCB.NextThread, zeroes PRCB.NextThread and readies PRCB.CurrentThread for execution by calling KiReadyThread.
KiReadyThread checks for thread's process state and inswaps its memory if the process was outswapped (it inserts process in the KiProcessInSwapListHead and signals KiSwapEvent). If the thread's kernel stack was not resident, KiReadyThread inserts thread in the KiStackInSwapListHead and signals KiSwapEvent to inswap thread's kernel stack. If the thread's process memory and thread's kernel stack were resident, KiReadyThread searches for idle processor and if there is at least one idle processor it sets respective PRCB's NextThread to specified thread. If there are many idle processors preference is given to the thread's ideal processor.
If there are no idle processors IdealProcessorPRCB.NextThread's priority is checked. If IdealProcessorPRCB.NextThread can be preempted (specified thread has higher priority), KiReadyThread sets it to the specified thread. If IdealProcessorPRCB.NextThread was not set KiReadyThread checks if IdealProcessorPRCB.CurrentThread can be preempted. If yes, then NextThread is set to the specified thread and dispatch interrupt is re-requested. If no thread can be preempted KiReadyThread inserts thread in the dispatcher queue (KiDispatcherReadyListHead) selected by its priority and corrects respective KiReadySummary bit to indicate that this priority array is not empty. After thread has been prepared, KiDispatchInterrupt calls SwapContext that swaps context from one thread to the next.


3. Code pullout
---------------

3.1. Engine info
----------------


This engine was already described in Kernel mode antihooking by building our own kernel


3.2. Usage
----------


An usage example you can see with the main Phide2 engine source. Here is a proto for the Pullout engine.


NTSTATUS
NTAPI
PullOutCode(
         // filled array of IMPORT_ENTRY structures
   IN PIMPORT_ENTRY Import,
         // array for code entries in the resulting code
   IN OUT PULONG CodeEntries,
         // address of the buffer with the resulting code
   OUT PUCHAR *NewCode OPTIONAL,
         // size of the resulting code
   OUT PULONG NewCodeSize OPTIONAL,
         // imagebase of the module which imports need to be included to the code
   IN PUCHAR pModuleForImportPatching OPTIONAL,
         // user-defined callback for fixing absolute pointers in the resulting code
   IN FIXRELOCS_CALLBACK FixRelocsCallback OPTIONAL
);


A callback for patching relocs in the generated code:


typedef
VOID
(__stdcall *FIXRELOCS_CALLBACK)(
         // imagebase of the parsed module
   PUCHAR pImage,
         // address of the pointer in the resulting code
   PULONG pFixedAddress,
         // RVA of the pointer in the parsed module
   ULONG OriginalFixupRva,
         // RVA this pointer points to in the parsed module
   ULONG TargetRva
);


Engine may return one of the following status codes.


STATUS_SUCCESS
     Everything went fine, copy of the code is ready to use.

STATUS_PASSIVE_LEVEL_REQUIRED
     PullOutCode() was called at raised IRQL.

STATUS_NTOSKRNL_NOT_FOUND
     Engine has failed to find kernel image on disk. Weird.

STATUS_MAP_IMAGE_FAILED
     Engine has failed to map kernel image.

STATUS_ADD_FUNCTION_FAILED
     Engine  has  failed  to add a function to the list of entry points for a
code walker (export not found, for example).

STATUS_COVERAGE_ERROR
     Engine has failed to build a code tree.

STATUS_CODE_REBUILDING_FAILED
     Engine has failed to fix copied code.

STATUS_UNSUCCESSFUL
     General error occured.



4. Phide2
---------

4.1. What is necessary to run a new scheduler
---------------------------------------------


When we exclude threads from KiWaitInListHead, KiWaitOutListHead and KiDispatcherReadyListHead klister doesn't see them any more. The problem is: how to make those excluded threads schedulable? We have to give them quantums from time to time. But we cannot call KiReadyThread, because SwapContext will bugcheck system when it will attempt to read data from the hidden thread's context, because if the hidden thread was in the waiting state with its stack outswapped (a very common situation) then

- unlinking from KiWaitIn/OutListHead makes KE unable to successfully satisfy thread's wait;
- thread's stack cannot be inswapped.

The solution is simple: we should create a new lists, remove hidden threads from original lists, place them to the ours and then create a sufficient copy of the scheduler code with all pointers to the old lists patched to the our lists. When we will call that copied code we will be sure that one of the hidden threads is tested to have higher priority than the current (probably not hidden) thread, thus making current thread preempted if it has lower priority. Furthermore, hidden thread will never be moved to the original scheduler lists because our copy of the code just doesn't know where the old lists are. There is a handy function NtYieldExecution that "yields execution to any ready thread for up to one quantum". If we patch its full execution tree to use our lists instead of original, it will schedule to execution only our threads. Another thing we should do is to run our own balance manager, because someone must inswap kernel stacks of the hidden threads when their wait is just satisfied and they don't have their kernel stacks resident.

These pointers must be patched in the execution tree of NtYieldExecution, KeBalanceSetManager and KeSwapProcessOrStack to separate our copy of scheduler code from the original one and make it viable:

KiDispatcherReadyListHead (array of 32 LIST_ENTRY - patch Flink and Blink)
KiWaitInListHead (LIST_ENTRY)
KiWaitOutListHead (LIST_ENTRY) (XP specific: only KiWaitListHead LIST_ENTRY instead of above two)

KiStackInSwapListHead (NT, 2k: LIST_ENTRY, XP: SINGLE_LIST_ENTRY)
KiProcessInSwapListHead (NT, 2k: LIST_ENTRY, XP: SINGLE_LIST_ENTRY)
KiProcessOutSwapListHead (NT, 2k: LIST_ENTRY, XP: SINGLE_LIST_ENTRY)

KiReadySummary (ULONG)
KiReadyQueueIndex (ULONG)
KiStackOutSwapRequest (BOOLEAN)
KiSwapEvent (KEVENT)
KiSwappingThread (PETHREAD) (XP specific: NT and 2k don't have this global)

We need 4 threads to make our scheduler online.

1st is the patched KeBalanceSetManager;
2nd is the patched KeSwapProcessOrStack;
3rd will periodically exclude hidden objects (threads and processes) from the original lists and insert them in the our new lists - this is needed because some functions like to place threads to the dispatcher lists (KeSetThreadPriority, etc);
4th will periodically call patched NtYieldExecution to give quantums to the hidden threads.


4.2. How to find needed non-exported symbols
--------------------------------------------


Hardcoded offsets are Really Bad. There are too many different ntoskrnls out there - checked/free builds, uni/multiprocessor, PAE, various hotfixes etc. So I decided to find all needed symbols by analyzing the kernel and without any code signatures.


4.2.1. NtYieldExecution
-----------------------


This is the easiest one. ZwYieldExecution is exported by ntoskrnl. Thus, we can find its service id by reading imm32 in the first "mov eax, imm32" instruction. Then we should get NtYieldExecution address by looking to the KiServiceTable array, which may be found this way: [2].


4.2.2. KiWait[In/Out]ListHead, KeDispatcherReadyListHead, KiReadySummary
------------------------------------------------------------------------


First, we find KiWaitInListHead and KiWaitOutListHead (KiWaitListHead in XP). It can be done by walking execution tree of the KeWaitForSingleObject, KeWaitForMultipleObjects and KeDelayExecutionThread (all exported) without entering their subfunctions. For each function we make a list of globals that are used by it (without subfunctions). Then we compute the intersection of these 3 sets of globals. This intersection consists of KiWaitInListHead.Flink, KiWaitOutListHead.Flink, KeTickCount.LowPart in NT/2k and KiWaitListHead.Flink, KiWaitListHead.Blink, KeTickCount.LowPart in XP. KeTickCount is also exported, so we easily exclude it from the resulting set. If the remaining two set elements have next addresses (Flink/Blink), we have a xp kernel with KiWaitListHead. Otherwise, we found KiWaitInListHead and KiWaitOutListHead. We don't care which of two found globals is KiWaitInListHead and which is KiWaitOutListHead. Second, we find KiDispatcherReadyListHead and KiReadySummary. We find shared globals of KeSetAffinityThread and NtYieldExecution in NT/2k and KeDelayExecutionThread and NtYieldExecution in XP. This way we should find KiDispatcherReadyListHead and KiReadySummary, but we don't know what is what. KiDispatcherReadyListHead is a pointer, so it will be strange to meet "or KiDispatcherReadyListHead, " generated by the compiler. But ORs are very common for KiReadySummary because of the SetMember macro. So we enumerate all absolute pointer addressing in the ntoskrnl and check for some kind of OR instruction that modifies one of two given globals. The global for which OR is found will be KiReadySummary, the other will be KiDispatcherReadyListHead.


4.2.3. KeBalanceSetManager, KeSwapProcessOrStack
------------------------------------------------


These two globals are pushed as StartRoutine parameters to the PsCreateSystemThread calls made by MmInitSystem. We begin to walk code from the ntoskrnl entry point (it will come to MmInitSystem sooner or later). For every "push" and "pop" we met we properly correct a virtual "esp" and save every "push" argument. When we come to a "call PsCreateSystemThread", we look to the our stack trace buffer and find 6th parameter - this will be the StartRoutine. We don't care about subs/adds to esp, because compiler doesn't generate them when it makes a call to the stdcall function.

After we find another StartRoutine we should check - is it really a KeBalanceSetManager or KeSwapProcessOrStack. We make a list of calls made by this StartRoutine and check if found set of calls includes the "signature set" for KeBalanceSetManager and KeSwapProcessOrStack. NT4 KeBalanceSetManager should call at least
- KeSetPriorityThread,
- KeInitializeTimer,
- KeSetTimer,
- KeWaitForMultipleObjects.

NT4 KeSwapProcessOrStack should call at least
- MmQuerySystemSize,
- KeSetPriorityThread,
- KeWaitForSingleObject.

2k/XP KeBalanceSetManager should call at least
- MmQuerySystemSize,
- KeSetPriorityThread,
- KeInitializeTimer,
- KeSetTimer,
- KeWaitForMultipleObjects.

2k/XP KeSwapProcessOrStack should call at least
- KeSetPriorityThread,
- KeWaitForSingleObject.

This check is sufficient to distinguish KeBalanceSetManager and KeSwapProcessOrStack threads from the others.


4.2.4. KiStackInSwapListHead, KiProcess[In/Out]SwapListHead, KiSwapEvent, KiStackOutSwapRequest
------------------------------------------------


Now we find globals used by the KeBalanceSetManager and the KeSwapProcessOrStack. KeSwapProcessOrStack uses these globals:
- KiSwapEvent,
- KiStackOutSwapRequest,
- KiProcessOutSwapListHead,
- KiProcessInSwapListHead,
- KiStackInSwapListHead,
- KiStackProtectTime (NT4 only - it was moved to KeBalanceSetManager later),
- KiSwappingThread (XP only).

KeBalanceSetManager uses these globals:
- MmWorkingSetManagerEvent,
- KiStackProtectTime (NT4 KeBalanceSetManager doesn't have it),
- KiStackOutSwapRequest,
- KiSwapEvent (XP KeBalanceSetManager doesn't have it, it calls KiSetSwapEvent instead).

That is, KiStackOutSwapRequest is always shared and we found it. Now we should find KiSwapEvent. Note that KiSwapEvent is the only PKEVENT in the globals used by KeSwapProcessOrStack - we will use this. The idea is to enumerate all pointers in the ntoskrnl and find those that look like PKEVENTs - they should have "KEVENT.Header.Size=4" mov instruction. We know that KiSwapEvent is a SynchronizationEvent. That is, it's initialized by KeInitializeEventMacro which has the following mov instruction: "KEVENT.Header.Type = SynchronizationEvent". By checking presence of these two attributes in the KeSwapProcessOrStack globals we find the KiSwapEvent pointer.

Next, we should find three Ki*ListHead pointers. If we examine KeSwapProcessOrStack, we see that the first global used by it in XP is KiSwappingThread, and in NT4 it is KiStackProtectTime. 2k's KeSwapProcessOrStack has only 5 globals, two of them we've already found. If the kernel is NT4 or XP we simply skip its first used global. The rest 3 globals are those we're searching for: KiStackInSwapListHead and KiProcess[In/Out]SwapListHead.

The last thing to do is to find a KiStackInSwapListHead in those three pointers. This is needed, because new scheduler's balmgr operates differently with processes and threads lists. We will use the fact that KiStackInSwapListHead is always processed by last from the KiProcess[In/Out]SwapListHead and KiStackInSwapListHead in KiInitSystem and KeSwapProcessOrStack. If we find a code that uses all three of those lists very close to each other (this is only in two places of the kernel - in the initialization of these lists - KiInitSystem - and in the KeSwapProcessOrStack branching) we take the last one and that will be the KiStackInSwapListHead.


4.2.5. KiReadyQueueIndex
------------------------


KiReadyQueueIndex is used only by the KiScanReadyQueues, which is called by KeBalanceSetManager. So, we must check all subfunctions of the KeBalanceSetManager. For every function it calls, we build a list of the globals that are used by it. KiReadyQueueIndex uses these globals:
- KiReadySummary,
- KeTickCount,
- KiReadyQueueIndex,
- KiDispatcherReadyListHead,
- two data offsets for arguments to the RtlAssert call (only checked builds).

We know addresses of the KeTickCount, KiReadySummary and KiDispatcherReadyListHead, and we assume that KiReadyQueueIndex cannot be pushed on stack (it's value is only MOVed actually). This is enough: the KiScanReadyQueues candidate must have 4 globals on free builds and 6 on checked (two of them must be PUSHed), 3 of them must be already found. The last check for the KiReadyQueueIndex is this: its data value in the kernel image (not in memory) should be 1.


4.2.6. PsActiveProcessHead
--------------------------


This address is not necessary for the new scheduler, but we have to find it since we're not going to hide only from klister. We will use the fact that crash dump file header format is not changed between NT and XP. Dump files header starts from "PAGEDUMP" signature and they have PsActiveProcessHead value stored at file offset +0x1C. Dump files are written by IoWriteCrashDump function. The write buffer is prepared this way: 1st IoWriteCrashDump fills buffer memory with "PAGE", then "DUMP" is written at offset +0x4. After a while PsActiveProcessHead is written at offset +0x1C by the "mov dword ptr [reg32+1Ch], offset _PsActiveProcessHead" instruction.
So, our algo will be simple: we enumerate all executable sections in the ntoskrnl and find the "PAGE" constant. Then we check if there is a "mov [reg32+4], 'PMUD'" instruction nearby. If it is, we search for the "mov [reg32+1Ch], imm32" instruction in the next 100 instructions. If we found this imm32, we check if it is a pointer and not a data constant by walking all absolute pointers in the kernel image.


4.3. Usage
----------



NTSTATUS ProcessHide(
        IN ISPROCESSHIDDEN_CALLBACK IsProcessHidden
);


Main engine entry. Must be called at PASSIVE_LEVEL. The parameter specifies the user callback function that will take a PEPROCESS as an argument and will decide - should this process be hidden or not. Don't even think about hiding System (#8) process


NTSTATUS ShutdownPhide(
);


Shutdown function. Note that it doesn't stop two patched system threads (the KeBalanceSetManager and the KeSwapProcessOrStack). Therefore, it doesn't free their code memory.


typedef
BOOLEAN
(__stdcall *ISPROCESSHIDDEN_CALLBACK)(
        PEPROCESS Process
);


User supplied function that makes a main decision.


Engine may return one of the following status codes.


STATUS_SUCCESS
     Everything went fine, scheduler is online.

STATUS_ALREADY_STARTED
     An attempt to run engine twice. Interact with your callback instead.

STATUS_UNSUPPORTED_OS
     This OS is not supported by the engine (2k3 and above).


Refer to the 3.2. point for these.


STATUS_PASSIVE_LEVEL_REQUIRED
STATUS_NTOSKRNL_NOT_FOUND
STATUS_MAP_IMAGE_FAILED
STATUS_ADD_FUNCTION_FAILED
STATUS_COVERAGE_ERROR
STATUS_CODE_REBUILDING_FAILED
STATUS_UNSUCCESSFUL



5. Conclusion
-------------


As you can see, it's possible to evade klister-like tools with no hooks in just a few kilobytes of code. So if Joanna has written a private tool for detecting hidden processes that works on all NTs I would like to see it :) It's funny to write a powerful hiding code for OS that doesn't have advanced process detector available to the public yet (I mean NT4 and XP).
All that "security thru obscurity" private tool concept is a ridiculous thing, isn't it? Btw, 2k3 server support is yet to come ;)


6. Greets
---------


- Opc0de, who helped me with ideas and bughunting;
- nobodi, who helped the "scheduler idea" to be born ;)
- unknown heroes, who made nt4 and w2k sources available to the public;
- Joanna Rutkowska, for writing klister :)
- all engine betatesters.

PS. Happy New Year, Joanna!