Measuring Processor Utilization and Queuing Delays in Windows applications

Continuing my answer to the mail I received recently from Uriel Carrasquilla… Uri’s note, reprinted in the previous post, refers to an “issue” associated with the current technique for measuring processor utilization in Windows. As my reply mentioned, these are documented and well-understood issues. At the core is the methodology used to calculate processor utilization…

1

Parallel Scalability Isn’t Child’s Play, Part 3: The Problem with Fine-Grained Parallelism

In the last blog entry in this series, I introduced the model for parallel program scalability proposed by Neil Gunther, which I praised for being a realistic antidote to more optimistic, but better known, formulas. Gunther’s model adds a new parameter to the more familiar Amdahl’s law. The additional parameter k, representing coherence-related delays, enables…


Are we taking advantage of Parallelism?

Recently, a colleague of mine, Mark Friedman, posted a blog titled “Parallel Scalability Isn’t Child’s Play” in which he reviewed the merits of Amdahl Law vs. Gunther’s Law for determining the practical limits to parallelization. I would not argue with the premise of Mark’s blog that Parallelism is not child’s play. However, I do have…

1

Parallel Scalability Isn’t Child’s Play, Part 2: Amdahl’s Law vs. Gunther’s Law

Part 1 of this series of blog entries discussed results from simulating the performance of a massively parallel SIMD application on several alternative multi-core architectures. These results were reported by researchers at Sandia Labs and publicized in a press release. Neil Gunther, my colleague from the Computer Measurement Group (CMG), referred to the Sandia findings…

4

PDC2008 preConference Workshop

Over the past several weeks, I have been working overtime developing a presentation on web application performance to be given at the upcoming Professional Developer’s Conference (PDC), which is next week in Los Angeles. This is partly why I have been remiss about blogging this month. At least, that is my excuse, and I am…

0

Mainstream NUMA and the TCP/IP stack: Final Thoughts

This is a continuation of Part IV of this article posted here.  Note that a final version of a white paper tying this series of five blog entries together (and a Powerpoint presentation on the subject) are attached. For many years, the effort to improve network performance on Windows and other platforms focused on reducing…

5

Mainstream NUMA and the TCP/IP stack: Part I.

One of the intriguing aspects of the onset of the many-core processor era is the necessity of using parallel programming techniques to reap the performance benefits of this and future generations of processor chips. Instead of significantly faster processors, we are getting more of them packaged on a single chip. To build the cost-effective mid-range…


VS2008 SP1 and .NET FX Beta Performance Improvements

You probably already saw Soma’s Blog on the Beta for Visual Studio 2008 and .NET FX 3.5 SP1.  If you can, please download and install the Beta quickly (be sure to read the readme for Visual Studio Professional and for Visual Studio Team System first).  The sooner we get your feedback, the sooner we can…

13

Thoughts on Intel’s recent hardware announcements

Intel briefed customers recently about the evolution of its processor architectures to support ManyCore processors. Highlights of the press briefing include announcing the quad-core Tukwila processor that supports the IA-64 Itanium architecture and a six-core x64-based processor called Dunnington that will be available later this year. The major focus of the announcement though was the…

2

Parallel programming: Where Do We Go From Here: Part 1

The Performance of Desktop Applications in the ManyCore Era The Quad-cores are coming! The Quad-cores are coming! Beginning in early 2008, machines with the latest quad-core processors became available from the major manufacturers. Should you be excited about the prospect? Should you run right out and buy one? These machines will have 4 independent processor…

0