A Model for cold startup time of an application on Windows.

Well it is has been a while.    I see now that it is been getting close to 1/2 a year since I last posted.   Sigh.  It is so easy for other things to get busy and not to blog.   I have resolved to try to be a bit more methodical about and insure that I write something every week or so. 

 Over the past year I have been doing quite a bit of work with the Common Language runtime trying to improve its cold startup time.   This has been a common complaint, and we knew when we shipped version 2.0 of the runtime that there was a lot we could do.   Unfortunately, it may be a while before the wider world see the benefit of this work (our ship cycle is quite long, and the next public release of the runtime has no major changes to the ‘core’ part of the runtime that I work on).   Nevertheless, I learn some useful things so far in this work and I thought I would write some of them down.

 When we first seriously looked at cold startup time we had two problems

  1. Our measurements of cold startup were VERY noisy.  This makes work quite difficult. 

  2. We did not have a clear idea how to prioritize work.  We knew Disk I/O was important, but how important?  Were disk seeks or data transfer time most important?  We needed to have a good model of what contributed to cold startup time before we could come up with an attack plan and prioritize the work properly. 

The good news is that we came up with simple model that we are pretty happy with that helped solve both of these problem that I would like to share with you. 

First: What is cold startup time?

 Cold startup time is defined to be the time it takes an application to start up (be responsive to user input), on a machine that has just been rebooted.   The important characteristic of cold startup time is that any data on the disk (mostly the program itself), needs to be read off the disk because it is not in the operating system ‘s disk cache.   Thus cold startup has a very strong component of Disk I/O time associated with this data fetch.  In contrast, warm startup time is defined as application startup time on second launch.   Under normal circumstances warm startup has almost NO disk I/O because literally everything the program needs is in cache, and thus the program never needs to go to disk to get anything. 

 This leads to a very nice formula

  • ColdStartup = WarmStartup + DiskIOTime

This formula is accurate to a very good approximation.   Moreover DiskIOTime can be broken into two parts:  The time it takes to move the disk head and for the disk to rotate so that the correct data is under the head (Seek Time), and the time it takes to actually transfer the data.   Thus

  • DiskIOTime = DiskSeekTime + DiskDataTransferTime.

These quantities can also be modeled well.  Disk Seek time does depend on how far the disk head needs to travel, however disks are now quite large (100s of Gig), and the Operating system does try to keep files that are used together close together on the disk (More on that on a later blog if there is interest), so the amount of head movement is typically very small as a % of the total possible disk travel.   Moreover, whenever there is ANY movement of the disk head, it takes a significant amount of time for the head to ‘settle’ so that an accurate read can happen.  Thus to a good approximation, the seek time is independent of how far the disk head moves, and thus can be accurate modeled by a constant.   Some empirical data gathering shows that for the machines that we typically use this time was about 4msec per movement.  Thus

  •  DiskSeekTimeMsec = NumberOfSeeks * 4

The DiskTransferTime can also be accurately be modeled.  It is completely determined by the rotational speed of the disk, the number of read heads, and the density of the bits on the disk (all of which are constant for a given disk).   Typical disks that we use give about 50MBytes / sec transfer speeds.  This means that it takes 1/50 of a second (20 msec) to transfer 1MByte of data.   Thus

  • DiskTransferTimeMsec = NumberMBytes * 20

Putting this all together gives the formula

  • ColdStartupTimeMSec = WarmStartupTimeMsec + 4 * NumberOfReads + 20 * NumberMBytes

This formula shows that you can attach cold startup one of three ways:  Either reduce the warm startup time (computations done at startup by the app), the number of data read from the disk (typically dominated by the amount of UNIQUE code used at startup), or the number of disk reads (typically by packing the code used in the startup path so that it is all together and can be read from the disk in one read). 

Now the formula does depend on disk technology, but typically will not vary in huge ways (by more than a factor of 2) from one disk to another (assuming a certain quality in your hardware).    In fact one of the nice benefits of the formula is that it is MACHINE INDEPENDENT (assuming the warm startup time is hardware independent (at least approximately)).  Thus measurements on one hardware can be compared to that on different hardware (you can’t do that with time).   Effectively the formula calculates the cold startup time on a theoretical piece of hardware (that happens to have the specs of the machine I used to derive the 4 msec and 20 msec/MB numbers. 

This solved both of our original problems (noise and prioritizing dev work).  The main reason cold startup is noisy is that the disk is being shared by ALL apps in the system, and thus their activity can perturb the time.  However the operating system already tracks the metrics above (number of read I/O and the number of I/O bytes), and breaks it down PER PROCESS, which means these I/O measurements have NO NOISE associated with other processes activity.  This is GREAT and reduces the noise SUBSTANTIALLY.    It solved the prioritizing problem because it tells us the first level of breakdown of what contributes to cold startup and thus we know which one more dominate for a given scenario (typically none of the terms can be ignored), and thus where we can get the best benefit (for a given cost).

Next time I will talk about how these numbers play out in a real scenario, and where that led us to work prioritization. 

 Its good to be back!



Comments (4)

  1. AtulGupta says:

    Interesting in-sight. Looking forward to more information

  2. dimkaz says:

    Hi Vance,

    are you sure the following is not a big oversimplification?

    ColdStartup = WarmStartup + DiskIOTime.

    Considering that most io done on starup (such as dll loading) is done synch. and given multiple threads.

    As part of optimizing our startup we have the following split. (simplified)

    We break our startup into multiple thread that touch different code (dlls).

    On warm startup our 2x cores on the startup are almost at 100%.

    On the cold startup cpu utilization goes rarely to 60%. I should also note that we are not using ngen and the jit time is a factor here as well (increase CPU time)

    My rough estimate is that in my case you would be off by as much as 20%.

    What other scenarios do you consider? Is ngen case your main focus? The good old technique of ordering functions is gone from .net, will it return? What about metadata?

    Please continue to post. I always enjoy your articles.

  3. After my last blog entry on cold startup a reader ( dimkaz ) worried that the formula would not be accurate

  4. What’s Coming in .NET Runtime Performance in Version V3.5 SP1 It certainly has been a while since I last