Processes, Threads, and Jobs in the Windows Operating System
- 6/17/2009
Flow of CreateProcess
So far in this chapter, you’ve seen the structures that are part of a process and the API functions with which you (and the operating system) can manipulate processes. You’ve also found out how you can use tools to view how processes interact with your system. But how did those processes come into being, and how do they exit once they’ve fulfilled their purpose? In the following sections, you’ll discover how a Windows process comes to life.
A Windows subsystem process is created when an application calls one of the process creation functions, such as CreateProcess, CreateProcessAsUser, CreateProcessWithTokenW, or CreateProcessWithLogonW. Creating a Windows process consists of several stages carried out in three parts of the operating system: the Windows client-side library Kernel32.dll (in the case of the CreateProcessAsUser, CreateProcessWithTokenW, and CreateProcessWithLogonW routines, part of the work is first done in Advapi32.dll), the Windows executive, and the Windows subsystem process (Csrss).
Because of the multiple environment subsystem architecture of Windows, creating an executive process object (which other subsystems can use) is separated from the work involved in creating a Windows subsystem process. So, although the following description of the flow of the Windows CreateProcess function is complicated, keep in mind that part of the work is specific to the semantics added by the Windows subsystem as opposed to the core work needed to create an executive process object.
The following list summarizes the main stages of creating a process with the Windows CreateProcess function. The operations performed in each stage are described in detail in the subsequent sections. Some of these operations may be performed by CreateProcess itself (or other helper routines in user mode), while others will be performed by NtCreateUserProcess or one of its helper routines in kernel mode. In our detailed analysis to follow, we will differentiate between the two at each step required.
Validate parameters; convert Windows subsystem flags and options to their native counterparts; parse, validate, and convert the attribute list to its native counterpart.
Open the image file (.exe) to be executed inside the process.
Create the Windows executive process object.
Create the initial thread (stack, context, and Windows executive thread object).
Perform post-creation, Windows-subsystem-specific process initialization.
Start execution of the initial thread (unless the CREATE_ SUSPENDED flag was specified).
In the context of the new process and thread, complete the initialization of the address space (such as load required DLLs) and begin execution of the program.
Figure 5-5 shows an overview of the stages Windows follows to create a process.
Figure 5-5. The main stages of process creation
Stage 1: Converting and Validating Parameters and Flags
Before opening the executable image to run, CreateProcess performs the following steps:
In CreateProcess, the priority class for the new process is specified as independent bits in the CreationFlags parameter. Thus, you can specify more than one priority class for a single CreateProcess call. Windows resolves the question of which priority class to assign to the process by choosing the lowest-priority class set.
If no priority class is specified for the new process, the priority class defaults to Normal unless the priority class of the process that created it is Idle or Below Normal, in which case the priority class of the new process will have the same priority as the creating class.
If a Real-time priority class is specified for the new process and the process’s caller doesn’t have the Increase Scheduling Priority privilege, the High priority class is used instead. In other words, CreateProcess doesn’t fail just because the caller has insufficient privileges to create the process in the Real-time priority class; the new process just won’t have as high a priority as Real-time.
All windows are associated with desktops, the graphical representation of a workspace. If no desktop is specified in CreateProcess, the process is associated with the caller’s current desktop.
If the process is part of a job object, but the creation flags requested a separate virtual DOS machine (VDM), the flag is ignored.
If the caller is sending a handle to a monitor as an output handle instead of a console handle, standard handle flags are ignored.
If the creation flags specify that the process will be debugged, Kernel32 initiates a connection to the native debugging code in Ntdll.dll by calling DbgUiConnectToDbg and gets a handle to the debug object from the thread environment block (TEB) once the function returns.
Kernel32.dll sets the default hard error mode if the creation flags specified one.
The user-specified attribute list is converted from Windows subsystem format to native format, and internal attributes are added to it.
Once these steps are completed, CreateProcess will perform the initial call to NtCreateUser-Process to attempt creation of the process. Because Kernel32.dll has no idea at this point whether the application image name is a real Windows application, or if it might be a POSIX, 16-bit, or DOS application, the call may fail, at which point CreateProcess will look at the error reason and attempt to correct the situation.
Stage 2: Opening the Image to Be Executed
As illustrated in Figure 5-6, the first stage in NtCreateUserProcess is to find the appropriate Windows image that will run the executable file specified by the caller and to create a section object to later map it into the address space of the new process. If the call failed for any reason, it will return to CreateProcess with a failure state (see Table 5-6) that will cause CreateProcess to attempt execution again.
If the executable file specified is a Windows .exe, NtCreateUserProcess will try to open the file and create a section object for it. The object isn’t mapped into memory yet, but it is opened. Just because a section object has been successfully created doesn’t mean that the file is a valid Windows image, however; it could be a DLL or a POSIX executable. If the file is a POSIX executable, the image to be run changes to Posix.exe, and CreateProcess restarts from the beginning of Stage 1. If the file is a DLL, CreateProcess fails.
Now that NtCreateUserProcess has found a valid Windows executable image, as part of the process creation code described in Stage 3 it looks in the registry under HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options to see whether a subkey with the file name and extension of the executable image (but without the directory and path information—for example, Image.exe) exists there. If it does, PspAllocateProcess looks for a value named Debugger for that key. If this value is present, the image to be run becomes the string in that value and CreateProcess restarts at Stage 1.
On the other hand, if the image is not a Windows .exe (for example, if it’s an MS-DOS, Win16, or a POSIX application), CreateProcess goes through a series of steps to find a Windows support image to run it. This process is necessary because non-Windows applications aren’t run directly—Windows instead uses one of a few special support images that in turn are responsible for actually running the non-Windows program. For example, if you attempt to run a POSIX application, CreateProcess identifies it as such and changes the image to be run to the Windows executable file Posix.exe. If you attempt to run an MS-DOS or a Win16 executable, the image to be run becomes the Windows executable Ntvdm.exe. In short, you can’t directly create a process that is not a Windows process. If Windows can’t find a way to resolve the activated image as a Windows process (as shown in Table 5-6), CreateProcess fails.
Figure 5-6. Choosing a Windows image to activate
Table 5-6. Decision Tree for Stage 1 of CreateProcess
If the Image . . . |
Create State Code |
This Image Will Run . . . |
. . . and This Will Happen |
Is a POSIX executable file |
PsCreateSuccess |
Posix.exe |
CreateProcess restarts Stage 1. |
Is an MS-DOS application with an .exe, a .com, or a .pif extension |
PsCreateFailOnSectionCreate |
Ntvdm.exe |
CreateProcess restarts Stage 1. |
Is a Win16 application |
PsCreateFailOnSectionCreate |
Ntvdm.exe |
CreateProcess restarts Stage 1. |
Is a Win64 application on a 32-bit system (or a PPC, MIPS, or Alpha Binary) |
PsCreateFailMachineMismatch |
N/A |
CreateProcess will fail. |
Has a Debugger key with another image name |
PsCreateFailExeName |
Name specified in the Debugger key |
CreateProcess restarts Stage 1. |
Is an invalid or damaged Windows EXE |
PsCreateFailExeFormat |
N/A |
CreateProcess will fail. |
Cannot be opened |
PsCreateFailOnFileOpen |
N/A |
CreateProcess will fail. |
Is a command procedure (application with a .bat or a .cmd extension) |
PsCreateFailOnSectionCreate |
Cmd.exe |
CreateProcess restarts Stage 1. |
Specifically, the decision tree that CreateProcess goes through to run an image is as follows:
If the image is an MS-DOS application with an .exe, a .com, or a .pif extension, a message is sent to the Windows subsystem to check whether an MS-DOS support process (Ntvdm.exe, specified in the registry value HKLM\SYSTEM\CurrentControlSet\Control\WOW\cmdline) has already been created for this session. If a support process has been created, it is used to run the MS-DOS application. (The Windows subsystem sends the message to the VDM [Virtual DOS Machine] process to run the new image.) Then CreateProcess returns. If a support process hasn’t been created, the image to be run changes to Ntvdm.exe and CreateProcess restarts at Stage 1.
If the file to run has a .bat or a .cmd extension, the image to be run becomes Cmd.exe, the Windows command prompt, and CreateProcess restarts at Stage 1. (The name of the batch file is passed as the first parameter to Cmd.exe.)
If the image is a Win16 (Windows 3.1) executable, CreateProcess must decide whether a new VDM process must be created to run it or whether it should use the default sessionwide shared VDM process (which might not yet have been created). The CreateProcess flags CREATE_SEPARATE_WOW_VDM and CREATE_SHARED_WOW_VDM control this decision. If these flags aren’t specified, the registry value HKLM\SYSTEM\CurrentControlSet\Control\WOW\DefaultSeparateVDM dictates the default behavior. If the application is to be run in a separate VDM, the image to be run changes to the value of HKLM\SYSTEM\CurrentControlSet\Control\WOW\wowcmdline and CreateProcess restarts at Stage 1. Otherwise, the Windows subsystem sends a message to see whether the shared VDM process exists and can be used. (If the VDM process is running on a different desktop or isn’t running under the same security as the caller, it can’t be used and a new VDM process must be created.) If a shared VDM process can be used, the Windows subsystem sends a message to it to run the new image and CreateProcess returns. If the VDM process hasn’t yet been created (or if it exists but can’t be used), the image to be run changes to the VDM support image and CreateProcess restarts at Stage 1.
Stage 3: Creating the Windows Executive Process Object (PspAllocateProcess)
At this point, NtCreateUserProcess has opened a valid Windows executable file and created a section object to map it into the new process address space. Next it creates a Windows executive process object to run the image by calling the internal system function PspAllocateProcess. Creating the executive process object (which is done by the creating thread) involves the following substages:
Setting up the EPROCESS block
Creating the initial process address space
Initializing the kernel process block (KPROCESS)
Setting up the PEB
Concluding the setup of the process address space (which includes initializing the working set list and virtual address space descriptors and mapping the image into address space)
Stage 3A: Setting Up the EPROCESS Block
This substage involves the following steps:
Allocate and initialize the Windows EPROCESS block.
Inherit the Windows device namespace (including the definition of drive letters, COM ports, and so on).
Inherit the process affinity mask and page priority from the parent process. If there is no parent process, the default page priority (5) is used, and an affinity mask of all processors (KeActiveProcessors) is used.
Set the new process’s quota block to the address of its parent process’s quota block, and increment the reference count for the parent’s quota block. If the process was created through CreateProcessAsUser, this step won’t occur.
The process minimum and maximum working set size are set to the values of PspMinimumWorkingSet and PspMaximumWorkingSet, respectively. These values can be overridden if performance options were specified in the PerfOptions key part of Image File Execution Options, in which case the maximum working set is taken from there.
Store the parent process’s process ID in the InheritedFromUniqueProcessId field in the new process object.
Attach the process to the session of the parent process.
Initialize the KPROCESS part of the process object. (See Stage 3C.)
Create the process’s primary access token (a duplicate of its parent’s primary token). New processes inherit the security profile of their parents. If the CreateProcessAsUser function is being used to specify a different access token for the new process, the token is then changed appropriately.
The process handle table is initialized. If the inherit handles flag is set for the parent process, any inheritable handles are copied from the parent’s object handle table into the new process. (For more information about object handle tables, see Chapter 3.) A process attribute can also be used to specify only a subset of handles, which is useful when you are using CreateProcessAsUser to restrict which objects should be inherited by the child process.
If performance options were specified through the PerfOptions key, these are now applied. The PerfOptions key includes overrides for the working set limit, I/O priority, page priority, and CPU priority class of the process.
The process priority class and quantum are computed and set.
Set the new process’s exit status to STATUS_PENDING.
Stage 3B: Creating the Initial Process Address Space
The initial process address space consists of the following pages:
Page directory (and it’s possible there’ll be more than one for systems with page tables more than two levels, such as x86 systems in PAE mode or 64-bit systems)
Hyperspace page
Working set list
To create these three pages, the following steps are taken:
Page table entries are created in the appropriate page tables to map the initial pages.
The number of pages is deducted from the kernel variable MmTotalCommittedPages and added to MmProcessCommit.
The systemwide default process minimum working set size (PsMinimumWorkingSet) is deducted from MmResidentAvailablePages.
The page table pages for the nonpaged portion of system space and the system cache are mapped into the process.
Stage 3C: Creating the Kernel Process Block
The next stage of PspAllocateProcess is the initialization of the KPROCESS block. This work is performed by KeInitializeProcess, which contains:
A pointer to a list of kernel threads. (The kernel has no knowledge of handles, so it bypasses the object table.)
A pointer to the process’s page table directory (which is used to keep track of the process’s virtual address space).
The total time the process’s threads have executed.
The number of clock cycles the process’s threads have consumed.
The process’s default base-scheduling priority (which starts as Normal, or 8, unless the parent process was set to Idle or Below Normal, in which case the setting is inherited).
The default processor affinity for the threads in the process.
The process swapping state (resident, out-swapped, or in transition).
The NUMA ideal node (initially set to 0).
The thread seed, based on the ideal processor that the kernel has chosen for this process (which is based on the previously created process’s ideal processor, effectively randomizing this in a round-robin manner). Creating a new process will update the seed in KeNodeBlock (the initial NUMA node block) so that the next new process will get a different ideal processor seed.
The initial value (or reset value) of the process default quantum (which is described in more detail in the Thread Scheduling section later in the chapter), which is hard-coded to 6 until it is initialized later (by PspComputeQuantumAndPriority).
Stage 3D: Concluding the Setup of the Process Address Space
Setting up the address space for a new process is somewhat complicated, so let’s look at what’s involved one step at a time. To get the most out of this section, you should have some familiarity with the internals of the Windows memory manager, which are described in Chapter 9.
The virtual memory manager sets the value of the process’s last trim time to the current time. The working set manager (which runs in the context of the balance set manager system thread) uses this value to determine when to initiate working set trimming.
The memory manager initializes the process’s working set list—page faults can now be taken.
The section (created when the image file was opened) is now mapped into the new process’s address space, and the process section base address is set to the base address of the image.
Ntdll.dll is mapped into the process.
Stage 3E: Setting Up the PEB
NtCreateUserProcess calls MmCreatePeb, which first maps the systemwide national language support (NLS) tables into the process’s address space. It next calls MiCreatePebOrTeb to allocate a page for the PEB and then initializes a number of fields, which are described in Table 5-7.
Table 5-7. Initial Values of the Fields of the PEB
Field |
Initial Value |
ImageBaseAddress |
Base address of section |
NumberOfProcessors |
KeNumberProcessors kernel variable |
NtGlobalFlag |
NtGlobalFlag kernel variable |
CriticalSectionTimeout |
MmCriticalSectionTimeout kernel variable |
HeapSegmentReserve |
MmHeapSegmentReserve kernel variable |
HeapSegmentCommit |
MmHeapSegmentCommit kernel variable |
HeapDeCommitTotalFreeThreshold |
MmHeapDeCommitTotalFreeThreshold kernel variable |
HeapDeCommitFreeBlockThreshold |
MmHeapDeCommitFreeBlockThreshold kernel variable |
NumberOfHeaps |
0 |
MaximumNumberOfHeaps |
(Size of a page – size of a PEB) / 4 |
ProcessHeaps |
First byte after PEB |
MinimumStackCommit |
MmMinimumStackCommitInBytes kernel variable |
ImageProcessAffinityMask |
KeActiveProcessors or 1 << MmRotatingUniprocessorNumber kernel variable (for uniprocessor-only images) |
SessionId |
Result of MmGetSessionId |
ImageSubSystem |
OptionalHeader.Subsystem |
ImageSubSystemMajorVersion |
OptionalHeader.MajorSubsystemVersion |
ImageSubSystemMinorVersion |
OptionalHeader.MinorSubsystemVersion |
OSMajorVersion |
NtMajorVersion kernel variable |
OSMinorVersion |
NtMinorVersion kernel variable |
OSBuildNumber |
NtBuildNumber kernel variable & 0x3FFF, combined with CmNtCSDVersion for service packs |
OSPlatformId |
2 |
However, if the image file specifies explicit Windows version or affinity values, this information replaces the initial values shown in Table 5-7. The mapping from image information fields to PEB fields is described in Table 5-8.
Table 5-8. Windows Replacements for Initial PEB Values
Field Name |
Value Taken from Image Header |
OSMajorVersion |
OptionalHeader.Win32VersionValue & 0xFF |
OSMinorVersion |
(OptionalHeader.Win32VersionValue >> 8) & 0xFF |
OSBuildNumber |
(OptionalHeader.Win32VersionValue >> 16) & 0x3FFF, combined with ImageLoadConfigDirectory.CSDVersion |
OSPlatformId |
(OptionalHeader.Win32VersionValue >> 30) ^ 0x2 |
ImageProcessAffinityMask |
ImageLoadConfigDirectory.ProcessAffinityMask |
If the image header characteristics IMAGE_FILE_UP_SYSTEM_ONLY flag is set (indicating that the image can run only on a uniprocessor system), a single CPU is chosen for all the threads in this new process to run on. The selection process is performed by simply cycling through the available processors—each time this type of image is run, the next processor is used. In this way, these types of images are spread evenly across the processors.
If the image specifies an explicit processor affinity mask (for example, a field in the configuration header), this value is copied to the PEB and later set as the default process affinity mask.
Stage 3F: Completing the Setup of the Executive Process Object (PspInsertProcess)
Before the handle to the new process can be returned, a few final setup steps must be completed, which are performed by PspInsertProcess and its helper functions:
If systemwide auditing of processes is enabled (either as a result of local policy settings or group policy settings from a domain controller), the process’s creation is written to the Security event log.
If the parent process was contained in a job, the job is recovered from the job level set of the parent and then bound to the session of the newly created process. Finally, the new process is added to the job.
PspInsertProcess inserts the new process block at the end of the Windows list of active processes (PsActiveProcessHead).
The process debug port of the parent process is copied to the new child process, unless the NoDebugInherit flag is set (which can be requested when creating the process). If a debug port was specified, it is attached to the new process at this time.
Finally, PspInsertProcess notifies any registered callback routines, creates a handle for the new process by calling ObOpenObjectByPointer, and then returns this handle to the caller.
Stage 4: Creating the Initial Thread and Its Stack and Context
At this point, the Windows executive process object is completely set up. It still has no thread, however, so it can’t do anything yet. It’s now time to start that work. Normally, the PspCreateThread routine is responsible for all aspects of thread creation and is called by NtCreateThread when a new thread is being created. However, because the initial thread is created internally by the kernel without user-mode input, the two helper routines that PspCreateThread relies on are used instead: PspAllocateThread and PspInsertThread.
PspAllocateThread handles the actual creation and initialization of the executive thread object itself, while PspInsertThread handles the creation of the thread handle and security attributes and the call to KeStartThread to turn the executive object into a schedulable thread on the system. However, the thread won’t do anything yet—it is created in a suspended state and isn’t resumed until the process is completely initialized (as described in Stage 5).
PspAllocateThread performs the following steps:
An executive thread block (ETHREAD) is created and initialized.
Before the thread can execute, it needs a stack and a context in which to run, so these are set up. The stack size for the initial thread is taken from the image—there’s no way to specify another size.
The thread environment block (TEB) is allocated for the new thread.
The user-mode thread start address is stored in the ETHREAD. This is the system-supplied thread startup function in Ntdll.dll (RtlUserThreadStart). The user’s specified Windows start address is stored in the ETHREAD block in a different location so that debugging tools such as Process Explorer can query the information.
KeInitThread is called to set up the KTHREAD block. The thread’s initial and current base priorities are set to the process’s base priority, and its affinity and quantum are set to that of the process. This function also sets the initial thread ideal processor. (See the section Ideal and Last Processor for a description of how this is chosen.) KeInitThread next allocates a kernel stack for the thread and initializes the machine-dependent hardware context for the thread, including the context, trap, and exception frames. The thread’s context is set up so that the thread will start in kernel mode in KiThreadStartup. Finally, KeInitThread sets the thread’s state to Initialized and returns to PspAllocateThread.
Once that work is finished, NtCreateUserProcess will call PspInsertThread to perform the following steps:
A thread ID is generated for the new thread.
The thread count in the process object is incremented, and the thread is added into the process thread list.
The thread is put into a suspended state.
The object is inserted and any registered thread callbacks are called.
The handle is created with ObOpenObjectByName.
The thread is readied for execution by calling KeStartThread.
Stage 5: Performing Windows Subsystem–Specific Post-Initialization
Once NtCreateUserProcess returns with a success code, all the necessary executive process and thread objects have been created. Kernel32.dll will now perform various operations related to Windows subsystem–specific operations to finish initializing the process.
First of all, various checks are made for whether Windows should allow the executable to run. These checks includes validating the image version in the header and checking whether Windows application certification has blocked the process (through a group policy). On specialized editions of Windows Server 2008, such as Windows Web Server 2008 and Windows HPC Server 2008, additional checks are made to see if the application imports any disallowed APIs.
If software restriction policies dictate, a restricted token is created for the new process. Afterward, the application compatibility database is queried to see if an entry exists in either the registry or system application database for the process. Compatibility shims will not be applied at this point—the information will be stored in the PEB once the initial thread starts executing (Stage 6).
At this point, Kernel32.dll sends a message to the Windows subsystem so that it can set up SxS information (see the end of this section for more information on side-by-side assemblies) such as manifest files, DLL redirection paths, and out-of-process execution for the new process. It also initializes the Windows subsystem structures for the process and initial thread. The message includes the following information:
Process and thread handles
Entries in the creation flags
ID of the process’s creator
Flag indicating whether the process belongs to a Windows application (so that Csrss can determine whether or not to show the startup cursor)
UI language Information
DLL redirection and .local flags
Manifest file information
The Windows subsystem performs the following steps when it receives this message:
CsrCreateProcess duplicates a handle for the process and thread. In this step, the usage count of the process and the thread is incremented from 1 (which was set at creation time) to 2.
If a process priority class isn’t specified, CsrCreateProcess sets it according to the algorithm described earlier in this section.
The Csrss process block is allocated.
The new process’s exception port is set to be the general function port for the Windows subsystem so that the Windows subsystem will receive a message when a second chance exception occurs in the process. (For further information on exception handling, see Chapter 3.)
The Csrss thread block is allocated and initialized.
CsrCreateThread inserts the thread in the list of threads for the process.
The count of processes in this session is incremented.
The process shutdown level is set to 0x280 (the default process shutdown level—see SetProcessShutdownParameters in the MSDN Library documentation for more information).
The new process block is inserted into the list of Windows subsystem-wide processes.
The per-process data structure used by the kernel-mode part of the Windows subsystem (W32PROCESS structure) is allocated and initialized.
The application start cursor is displayed. This cursor is the familiar rolling doughnut shape—the way that Windows says to the user, “I’m starting something, but you can use the cursor in the meantime.” If the process doesn’t make a GUI call after 2 seconds, the cursor reverts to the standard pointer. If the process does make a GUI call in the allotted time, CsrCreateProcess waits 5 seconds for the application to show a window. After that time, CsrCreateProcess will reset the cursor again.
After Csrss has performed these steps, CreateProcess checks whether the process was run elevated (which means it was executed through ShellExecute and elevated by the AppInfo service after the consent dialog box was shown to the user). This includes checking whether the process was a setup program. If it was, the process’s token is opened, and the virtualization flag is turned on so that the application is virtualized. (See the information on UAC and virtualization in Chapter 6.) If the application contained elevation shims or had a requested elevation level in its manifest, the process is destroyed and an elevation request is sent to the AppInfo service. (See Chapter 6 for more information on elevation.)
Note that most of these checks are not performed for protected processes; because these processes must have been designed for Windows Vista or later, there’s no reason why they should require elevation, virtualization, or application compatibility checks and processing. Additionally, allowing mechanisms such as the shim engine to use its usual hooking and memory patching techniques on a protected process would result in a security hole if someone could figure how to insert arbitrary shims that modify the behavior of the protected process.
Stage 6: Starting Execution of the Initial Thread
At this point, the process environment has been determined, resources for its threads to use have been allocated, the process has a thread, and the Windows subsystem knows about the new process. Unless the caller specified the CREATE_ SUSPENDED flag, the initial thread is now resumed so that it can start running and perform the remainder of the process initialization work that occurs in the context of the new process (Stage 7).
Stage 7: Performing Process Initialization in the Context of the New Process
The new thread begins life running the kernel-mode thread startup routine KiThreadStartup. KiThreadStartup lowers the thread’s IRQL level from DPC/dispatch level to APC level and then calls the system initial thread routine, PspUserThreadStartup. The user-specified thread start address is passed as a parameter to this routine.
First, this function sets the Locale ID and the ideal processor in the TEB, based on the information present in kernel-mode data structures, and then it checks if thread creation actually failed. Next it calls DbgkCreateThread, which checks if image notifications were sent for the new process. If they weren’t, and notifications are enabled, an image notification is sent first for the process and then for the image load of Ntdll.dll. Note that this is done in this stage rather than when the images were first mapped, because the process ID (which is required for the callouts) is not yet allocated at that time.
Once those checks are completed, another check is performed to see whether the process is a debuggee. If it is, then PspUserThreadStartup checks if the debugger notifications have already been sent for this process. If not, then a create process message is sent through the debug object (if one is present) so that the process startup debug event (CREATE_PROCESS_DEBUG_INFO) can be sent to the appropriate debugger process. This is followed by a similar thread startup debug event and by another debug event for the image load of Ntdll.dll. DbgkCreateThread then waits for the Windows subsystem to get the reply from the debugger (via the ContinueDebugEvent function).
Now that the debugger has been notified, PspUserThreadStartup looks at the result of the initial check on the thread’s life. If it was killed on startup, the thread is terminated. This check is done after the debugger and image notifications to be sure that the kernel-mode and user-mode debuggers don’t miss information on the thread, even if the thread never got a chance to run.
Otherwise, the routine checks whether application prefetching is enabled on the system and, if so, calls the prefetcher (and Superfetch) to process the prefetch instruction file (if it exists) and prefetch pages referenced during the first 10 seconds the last time the process ran. (For details on the prefetcher and Superfetch, see Chapter 9.)
PspUserThreadStartup then checks if the systemwide cookie in the SharedUserData structure has been set up yet. If it hasn’t, it generates it based on a hash of system information such as the number of interrupts processed, DPC deliveries, and page faults. This systemwide cookie is used in the internal decoding and encoding of pointers, such as in the heap manager (for more information on heap manager security, see Chapter 9), to protect against certain classes of exploitation.
Finally, PspUserThreadStartup sets up the initial thunk context to run the image loader initialization routine (LdrInitializeThunk in Ntdll.dll), as well as the systemwide thread startup stub (RtlUserThreadStart in Ntdll.dll). These steps are done by editing the context of the thread in place and then issuing an exit from system service operation, which will load the specially crafted user context. The LdrInitializeThunk routine initializes the loader, heap manager, NLS tables, thread-local storage (TLS) and fiber-local storage (FLS) array, and critical section structures. It then loads any required DLLs and calls the DLL entry points with the DLL_PROCESS_ ATTACH function code. (See the sidebar “Side-by-Side Assemblies” for a description of a mechanism Windows uses to address DLL versioning problems.)
Once the function returns, NtContinue will restore the new user context and return back to user mode—thread execution now truly starts.
RtlUserThreadStart will use the address of the actual image entry point and the start parameter and call the application. These two parameters have also already been pushed onto the stack by the kernel. This complicated series of events has two purposes. First of all, it allows the image loader inside Ntdll.dll to set up the process internally and behind the scenes so that other user-mode code can run properly (otherwise, it would have no heap, no thread local storage, and so on).
Second, having all threads begin in a common routine allows them to be wrapped in exception handling, so that when they crash, Ntdll.dll is aware of that and can call the unhandled exception filter inside Kernel32.dll. It is also able to coordinate thread exit on return from the thread’s start routine and to perform various cleanup work. Application developers can also call SetUnhandledExceptionFilter to add their own unhandled exception handling code.