DP Client and Hardware Sizing

Distribution Point (DP) sizing has always been a little trickily. The issue is that a distribution point is basically a file/IIS server and as such, how do we size a machine, whose performance is based on both the number of connections and volume of file transfers?

Each customer will have a different number of clients and a different configuration of small, medium and large packages that will have corresponding service level agreements for deployment.

Performance of BITS vs SMB

Generally HTTP Get requests (as used by BITS - Background Intelligent Transfer Service ) will be faster than SMB. There are also benefits to the client machines from doing this, such as using the background processing/network to download.

The goal of a DP is to service as many clients as possible, however this varies greatly when considering that this “servicing” might mean a 5mb patch or a 500Mb application. My suggestion for sizing of DPs is to design an environment that can meet 80% of the likely scenarios.

For example: have enough DPs to allow the following SLA’s to be achieved:

1. The deployment of patches (up to 10-15 Mbs) to all clients simultaneously.

2. Regular software distributions of LOB type apps, up to say 100Mb, to 50% of your expected clients simultaneously

3. Ill-frequent software distributions, up to say 1Gb, to 10% of your installed clients simultaneously.

Now the simultaneous scenario for each of these situations may also vary. For example the distribution of emergency patches may need to occur to all clients as soon as possible (say 1-2 hour time window) vs. a regular distribution of LOB apps may be allowed to occur over a 2-4 hour window.

Once you know these parameters you can start building your design. The best way to test BITS/HTTP get requests is to use a tool like WCAT for IIS to simulate thousands of client requests. Basically install this on the client test machines, deploy your packages and configure WCAT to make GET requests. You can config WCAT to simulate X number of clients with Y number of connections etc.

While this tool doesn’t simulate BITS on the client, it does test the DP (and provide reports) on the number of requests it can support as well as allow to you capture performance counters.

Testing with this tool isn’t necessarily easy, but if you have a medium to large size environment that your manager expects you to be able to deploy under the SLA’s defined above, it’s well worth the investment during the test and pilot phases to perform this kind of testing.

Performance Testing

From my testing during the SMS2003 RTM timeframe I was able to establish the following scenario.

Using the following hardware

  • Dual 1Ghz Processors with 512Mb RAM and a single SCSI disk for packages (other were on the system for OS etc).
  • Using one machine as a server and additional machines as clients.
  • Clients using WCAT to simulate 100s of connects (up to 500, in this case).

Test Results

With 1 x 100Mb network card (and a dedicated network subnet) the server was able to distribute ~4000 9Mb patch files within 1 hour.

With 2 x 100Mb network cards ( and two separate dedicated network subnets) the same server was able to distribute ~8000 9Mb patch files within 1 hour.

During these runs CPU of the server never exceeded 60%.

Network utilization on the server was 100% during the 2 runs.

Summary/Conclusions of Testing

Basically, as you can see, the network is the bottleneck. And it’s even worse when you consider real world conditions. It’s very unlikely that you, as an admin/consultant, can dictate the entire network infrastructure between the DP and the clients. This means that even the numbers above may be unachievable. On the upside, if you can utilize network resources such as multiple network cards on switched networks etc it may be possible to achieve and even exceed these results. Once again “It Depends”. J

Adding additional DPs may not be the answer either as they are still at the mercy of the network. In order for these additional DPs to work efficiently they need to be on segmented and/or switched networks (all the way to the client base) in order to achieve a consolidated performance result greater than those above. This is especially a concern in a campus type environment with little switching and segmenting.

For customers who place DPs in these types of environments consider gigabit solutions.

What kind of maximum throughput can I get out of my network card?

Network Card Speed

Megabytes / Sec

100 megabit (100Mbps)

~11

1 gigabit (1000Mbps)

~60

IMHO - My General Suggestions

  • IMHO I would suggest limiting DPs to around 1-2000 clients. If you don’t intend to distribute large packages (such as Office, OS and SPs) you could consider increasing this to 4000 clients per DP.
  • Even with only 1-2000 clients per DP, I wouldn’t recommend distributing these large packages 100+ Mb to all clients as the same time. Remember software is only part of the software distribution process, you need to manage your environment effectively with policies and procedures that allow you to make the most efficient use of the resources you have deployed while also trying to deploy the minimum hierarchy required to meet your internal customer’s SLAs.
  • Also consider the risk issues of deploying such applications to all users at the same time, it doesn’t matter how well the application or deployment/installation wrapper is written these is always the possibility of things blowing up.
  • Always use a phased approach to deploying software wherever possible (emergency patches may be the exception) sometimes.
  • First deploy the application in a test lab, then move to a controlled pilot and then finally to a deployment environment (which will generally utilize a rolling deployment strategy).
  • In all of these phases ensure you have a true representation of the machine configurations, both hardware and software, that you have deployed in production.

Footnote: SSL Implementation

At present the DP does not support SSL communications, however I did find this little piece of testing from the external IIS website:

The SSL implementation on Windows Server 2003 is different architecturally to SSL in previous Windows Server releases. The good news is that the design of the SSL implementation on Windows Server 2003 is self tuning and aware of the incoming load.

A frequently asked question is How much will HTTPS (SSL) cost me, compared to straight HTTP? The following test was devised to demonstrate the cost of SSL. The test hardware used was an 8-processor, 900-MHz server with 2 gigabit network cards. The SSL certificate key length was 1024 bit, the content used was an 8K static file, with six requests sent per connection. Each new connection performed a separate SSL handshake. The results of the test are below:

Without SSL

With SSL (1024-bit key)

Difference

Requests / Sec

9368.30

2462.10

~ -74%

CPU

47.81

51.72

~ 8%

The test above was purposely done to consume around 50% CPU to show what could be achieved in both configurations from a throughput perspective, at a CPU range that would be acceptable in a number of operational environments. The conclusion, for this load type, is that the throughput for HTTPS is roughly a quarter of what the straight HTTP case is. NOTE: The throughput will vary depending on the size of the data, the number of full SSL handshakes occurring, and the speed of the CPU.

SSL Accelerator cards

An SSL accelerator card can be a very effective acquisition to offload the CPU-intensive computations associated with SSL. SSL accelerator cards are now relatively cheap, and can be a great investment for improving the SSL throughput on a server.

A test was performed to gauge the throughput-enhancing capabilities of an SSL accelerator card with a new-generation, very fast CPU computer. The target server was a 1-processor, 3-Ghz HT (2 logical processors), with a 100-megabit network card and 512 MB of RAM.

The test characteristics were six requests to a piece of content that returned a 2K response per SSL connection, with a full SSL handshake after every six requests. The SSL public key length was 1024 bits. The results are below:

SSL in Software

SSL Accelerator Card

Difference

Requests / Sec

1281.39

3075.20

+139.99%

CPU %

100.00

76.90

- 23.10%

As represented in the table above, the impact of an SSL accelerator card was quite significant to this load. Even when a high-speed processor is used, the SSL accelerator card outperforms substantially. The SSL accelerator card could have done even better, in terms of throughput, but its network interface was bottlenecked in the test.

Note: This test is not a real world test. It is a test designed to show the best-case effect of SSL accelerator cards.