Closing the race window between creating a suspended process and putting it in a job


A customer had a test harness that spawns a very large number of processes. To make sure everything gets cleaned up if the test harness closes unexpectedly, they start the process suspended, then place the processes in a job object before resuming the process. The job object is marked as JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE so that when the last handle to the job is closed, the processes in the job are terminated.

This was successful for the most part, except for the small race window when the process has been created suspended but has not been added to a job. A poorly-timed cancellation of the test harness would leave the zombie process behind: Suspended, but with nobody around to terminate it.

The customer was looking for ideas to close this last remaining race window.

My suggestion was to put the test harness in the job, too! This entail splitting the test harness into two processes. The first process is what the test infrastructure launches. It doesn't actually run any tests, but rather creates the environment for tests to run. Put all the test harness code in the second process.

The first process creates the kill-on-close job, creates the second process (not suspended), passing it a handle to itself and a handle to an event. The first process puts the second process in the job and then signals the event, to tell the second process, "You're in a job now. You can do your thing."

The second process, when it starts up, waits for either of the passed-in handles to be signaled. If the handle to the first process is signaled, then it means that the test was canceled, and the second process should kill itself. If the event handle is signaled, then it means that the second process is now safely in a job and can start launching tests. Those tests will immediately belong to the job object, since they were created by a process already in the job. There is no "suspended process" window.

So now we have these points at which the test can be stopped:

Before the first process creates the second process, terminating the first process terminates the test.

After the first process creates the second process, but before it puts the second process into the job, terminating the first process will cause its own process handle to become signaled, and the second process responds by terminating itself. No tests have started yet.

After the second process is placed in the job, terminating the first process will cause the job handle to become closed, at which point everything in the job (the second process plus all of the tests) will be terminated.

If it's essential that the tests run in a separate job from the test harness, the test harness can create a second job for the tests themselves. It creates the tests suspended, then moves them into the second job. The tests always belong to some job, so they will get terminated eventually.

Comments (6)
  1. SpecLad says:

    There’s still a small possibility of a leaked process here: if the first process is terminated before it can put the second process into the job, and the second process hangs for whatever reason before it can exit. It’s quite unlikely, of course, but the possibility is there.

    1. skSdnW says:

      If WaitForMultipleObjects or ExitProcess/TerminateProcess fails then you have more to worry about than just a leaked process, you must have a bug in a kernel driver or something.

  2. uffa8 says:

    i can not understand for what here need use two processes ? “The first process puts the second process in the job” – why not second process yourself create kill-on-close job and assign yourself to this job ? all tests will be assigned to this job automatically. in this case the first process became not need. so for what carry out this task to separate “first” process ?
    also before Windows 8 process can be associated with only a single job. so for pre win8 we can not run tests in separate job. but begin from win 8 can

    1. Perhaps the process might not have permission to put itself into a job, but would have permission for a process that it created?

      1. uffa8 says:

        of course no. for call AssignProcessToJobObject we need have process handle PROCESS_SET_QUOTA and PROCESS_TERMINATE access rights. the pseudo-handle GetCurrentProcess() have all this – so no any problem yourself create kill-on-close job and assign yourself to this job. use additional separate process for this no sense

      2. uffa8 says:

        process can yourself create job object and assign yourself to it. problem was before win8, if process was already in job, because before win8 process can be associated only with a single job. only in this case exist sense try use 2 processes. but first process need only for start second with the CREATE_BREAKAWAY_FROM_JOB flag. it must not create job yourself, event, etc. only for breakway from job. so we must at begin call IsProcessInJob – if no – run as single process. if yes, call the CreateProcess (for self exe) function with the CREATE_BREAKAWAY_FROM_JOB flag. if of course job allow breakway (JOB_OBJECT_LIMIT_BREAKAWAY_OK must be set for job) or we got access denied error in call CreateProcess. but anyway if want – “If it’s essential that the tests run in a separate job from the test harness, the test harness can create a second job for the tests themselves” – we need win 8+ for this, but in this case, if process already in job at startup not a problem

        simply implementation: https://pastebin.com/Gh6Qqkq4

        void JobTest()
        {
        MessageBoxW(0,0,L”Begin Test”,0);
        BOOL bInJob;
        if (!IsProcessInJob(GetCurrentProcess(), 0, &bInJob))
        {
        GetLastError();
        return ;
        }

        STARTUPINFO si = { sizeof(si) };
        PROCESS_INFORMATION pi;
        WCHAR ApplicationName[MAX_PATH];
        if (bInJob)
        {
        //try breakway – need only before win 8
        if (GetModuleFileNameW(0, ApplicationName, RTL_NUMBER_OF(ApplicationName)))
        {
        if (CreateProcessW(ApplicationName, 0, 0, 0, 0, CREATE_BREAKAWAY_FROM_JOB, 0, 0, &si, &pi))
        {
        CloseHandle(pi.hThread);
        CloseHandle(pi.hProcess);
        }
        }

        GetLastError();
        return;
        }

        if (HANDLE hJob = CreateJobObject(0, 0))
        {
        JOBOBJECT_EXTENDED_LIMIT_INFORMATION jbli;
        jbli.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;

        if (SetInformationJobObject(hJob, JobObjectExtendedLimitInformation, &jbli, sizeof(jbli)) &&
        AssignProcessToJobObject(hJob, GetCurrentProcess()))
        {
        if (HANDLE hNestedJob = CreateJobObject(0, 0))
        {
        jbli.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_ACTIVE_PROCESS;
        jbli.BasicLimitInformation.ActiveProcessLimit = 1;

        if (SetInformationJobObject(hNestedJob, JobObjectBasicLimitInformation,
        &jbli.BasicLimitInformation, sizeof(jbli.BasicLimitInformation)))
        {
        if (GetEnvironmentVariable(L”ComSpec”, ApplicationName, RTL_NUMBER_OF(ApplicationName)))
        {
        // process will be auto asigned to kill-on-close hJob
        if (CreateProcessW(ApplicationName, 0, 0, 0, 0, CREATE_SUSPENDED, 0, 0, &si, &pi))
        {
        // before win 8 this fail anyway.
        if (AssignProcessToJobObject(hNestedJob, pi.hProcess))
        {
        // cmd.exe will be run bat can not exec any process yourself
        // say help command fail with
        ResumeThread(pi.hThread);
        }
        else
        {
        TerminateProcess(pi.hProcess, GetLastError());
        }
        CloseHandle(pi.hThread);
        CloseHandle(pi.hProcess);
        }
        GetLastError();
        }
        }
        CloseHandle(hNestedJob);
        }
        }

        MessageBoxW(0,0,L”End Test”,0);
        // terminate all
        CloseHandle(hJob);
        }
        }

Comments are closed.

Skip to main content