Jeremy Mazner has asked me to put together a 400-level session at this year's PDC. I came up with the title "Five(ish) things every Win32 developer should know (but likely doesn't)". Of course, now I have to think of five things! Here are some ideas I've been kicking around.

- The memory scene: Physical address space != physical memory != virtual memory != virtual address space
- Consequences of the way CPUs work: How my O(n) algorithm can run circles around your O(log n) algorithm; why much of what you learned in school simply doesn't matter
- Parent vs. owner windows
- Asynchronous input queues, the hazards of attaching thread input, and how it happens without your knowledge
- Dialog units, DPI and the overloaded term "large fonts"

Would you go to a talk that covered these topics? If not, what topics would you rather hear me talk about?

**Follow-up**: The talk will be a little over an hour. (And fixed the title. Thanks, Dave.)

Is this a short (1-2 hour) session? Or longer?

Seems like some of those topics could take a pretty long time, depending on how you covered them.

Topics sound very interesting, btw.

Off_T: I know I’m going to get in trouble for this diversion, but is it too late to change the title of the thread to "I’ll see (some of) you in Los Angeles in September"? Note where I put the closing paren. One of the first things I do when I see a parenthetical clause is read the sentence without it to make sure the whole phrase still makes sense. As posted, it doesn’t.

I wouldn’t point out this kind of grammar thing on anyone else’s blog, but we’re all computer folks here, so we have an affinity for consistent, logical constructs in a symbolic language.

Back on topic: I really like the idea of "Consequences of the way CPUs work." I run into this kind of thing all the time with other developers: Hash tables vs. linear searches for small data sets and the like. You could fill the entire talk with just this topic.

Dave

I would go to that talk to hear those topics but I can’t go :(

Will PDC be video taped? If not, can you stick up some sort of blog formatted version of those topics?

Yes. All topics look very interesting.

Bullets 1 and 2 would make me go for sure based on the title.

I’d be really interested in hearing the talk about your O(n) alorithm beating my O(log n). That sounds really cool.

I think the developer world would benefit greatly if your talk included detailed thought experiments in the Tao of Raymond: "Imagine if two programs did this" and "Imagine if this were possible."

Those two thought experiments have helped a LOT of developers avoid doing a LOT of questionable programming.

I’d love to see the "How my O(n) algorithm can run circles around your O(log n) algorithm" talk

I would go to just to find out more about those damn dialog units.

Although the O(n) vs. O(log n) is intriguing:

Live from the Belgaio…

"Amazing! Raymond throws down an O-n and walks away. Johnny Licious is just staring at his O-log-n in disbelief. I don’t think he even thought that was possible. I haven’t seen a play like that since the Tijuana Finals back in ’87. Wow! Everybody watching get used to the name Raymond Chen, because this guy’s gonna be around for a while."

Grr!! All the good talks are reserved for you lot on that side of the pond!! Somehow I don’t think I’ll get a trip to LA out of our training budget.

security descriptors!

Sounds great, I’m already excited. Be sure and try and post these topics afterwards for those of us who won’t be able to make it to this PDC.

Those topics sound great, wish I could go to the conference.

pleasePostOnHere++;

Those sound like great topics.

I hope they’re planning on putting you in the biggest room in the place. Something tells me your session will be VERY popular. :)

I don’t care what you are going to talk about I just want to attend your talk!!

Seriously, I think it’s be interesting either way.

"Consequences of the way CPUs work: How my O(n) algorithm can run circles around your O(log n) algorithm;"

… except for large values of n?

Please provide these on the internet somewhere after the talk. I would /really/ like to see these (We are having nightmares with high dpi right now where I work :/ ), but alas, the time and the money are not around right now…

Five(ish) things every

Win32developer should know? Only three of your five are specific to Win32 (I assume that the "how it happens without your knowledge" of item #4 is Win32-specific)For PDC, I think it would be better to only have Win32-specific issues. One thing I would like to know is "why do I often have to call GetLastError when a function fails, where there is plenty of room in the return code of that function?"

I was on the fence as to whether I would go to PDC, now I’m definitely going.

Thanks Raymond.

The whole O(N)/O(log N) argument is ridiculous. Next week Raymond might advocate the abolishment of indexing in databases, as we all know sequential scanning is much faster than a Btree lookup, right?

I’d like to see all those practical applications where O(N) algorithms outperform their O(log N) algorithms. I can imagine it might happen if the total runtime is very small, but that doesn’t count as a scenario where performance matters.

A common mistake is to confuse worst-case and average case performance. Quicksort, a worst case O(N^2) algorithm, outperforms most O(N log N) sorting algorithms, but only because the expected average performance is actually O(N log N) too. No way the best bubble sort implementation will beat any good sorting algorithm on an interesting data set.

I think this would be an awesome talk to attend, although I live in South Florida and wouldn’t be able to make it.

Maks, perhaps you should attend his talk then. He’s going to demonstrate that by knowing your platform, you can reduce the constants associated with your running time by orders of magnitude, thereby allowing your "slower" algorithm to outperform your "faster" algorithm over your entire input data set range. There’s nothing rediculous about it, and it has nothing to do with average-case performance nor anything to do with the total runtime (these algorithms are a function of the input size, not running time).

Here’s a question for you. Radix sort is a sorting algorithm that runs in O(n) time. That’s faster than every comparison-based sort. Why don’t we use radix sort for everything?

Here’s some more things you can choose from: http://www.flounder.com/badprogram.htm

I have no idea how you’re going to manage to fit all that into an hour.

And I’ll concur with all those who want more information available here for those who can’t make it out there.

While I’d appreciate it if all my colleagues had a solid understanding of dialog units and DPI, I’m not sure you could inspire anyone to improve the flexibility of their dialog layouts. Let’s face it, dialog units were a good idea that didn’t quite work. Explaining the details isn’t going to inspire many people to check their templates at 96 and 120 DPI, nor will it help anyone align the *baselines* of labels next to edit controls (because it’s simply not possible with DLU). Besides, in seven or eight years*, we’ll finally be able to give up our DIALOG resources and use XAML to get things to layout right with any font at any DPI.

I think the first four topics would be excellent. Four the fifth, I’d second the suggestion from the other commenter about detailing your commonly-used thought experiments. I also hope the owner/parent window talk (which I’d love to see) would include some of the details of modality you’ve discussed here in your blog.

—

* Seven or eight years before all of the pre-Avalon-capable version of Windows have faded far enough into obscurity that our marketing departments stop requiring us to be backward compatible.

I would go even if only the first 2 items were covered.

Actually, the reason I’ll go is still only the first 2, but who cares? Looks great.

I hope there are transcripts available!

Some of the comments here indicate that #2 would be a much-needed talk. The number of times I’ve seen a hashtable used to index 10 items, it’s just not funny…

Sure, for 1,000,000 items a hashtable is going to be the data structure you want, but if you’ve only got 10 then the overhead is enormous!

Oh yeah, and I second the transcripts! Unless you wanna pay for my ticket… I have no problems accepting charity :)

"my O(n) is faster than your O(log n)" :) and "why much of what you learned in school simply doesn’t matter":

Oh, I know answers to these. Who hasn’t learnt that in schools, was either in the school before processor caches existed, or was in bad school.

First, O(n) and O(log n) doesn’t say anything about the constants. So for example, 2*n is faster than 10000 + 10 * log( n ) until is n big enough.

Second, level 1, level 2 caches and page fault effects. Localization is a big thing. Of the code and of the data. Of the code — sometimes "optimized for size" code is faster than "optimized for speed": reason: the second one fits better in the instruction cache. Of the data — correctly organized structures (where I don’t say only single structures, but the whole layout of your working program, or simple things like what’s where in matrices) all make up for big differences.

For anybody interested in a good sample of how the program can be written to optimally adjust to different processors:

The Fastest Fourier Transform in the West

http://www.fftw.org/

Maks Verver:

I don’t know if you’ve worked on actual shipping software on actual computers as opposed to theory but very commonly, algorithmically "worse" algorithms dominate until you get to extremely large data sets.

I’ll take linear search of an unordered array that doesn’t cross a page boundary over just about any algorithm which might cause either a TLB miss or a page fault.

For very large data sets, it’s all about optimizing the number of I/Os, so depending on what you’re using to assign the O() notation, you could be well off. Usually it’s about comparisons; btrees don’t win because they have a better O() description. They win because they do a good job of managing the number of I/Os that need to be done.

Frankly I find the O() notation rather useless. It rarely gives good guidance about which algorithms work best in real world engineering contexts. I guess you need to make things simple for students but in the real world, other factors tend to dominate. (e.g. notice I did advocate linear search that didn’t cause page faults or tlb misses. linear search of a linked list is death because you have no reason to believe that going from element to element won’t pull in more and more pages.)

Well I’m trying to think of an O(log n) algorithm for use in comparing against anything. Let’s see, to keep it simple, consider adding two unsigned integers a + b whose values are bounded by some integer n, and N is the number of bits in the representation of n. An ordinary human-style addition algorithm, moving bit by bit from the right and doing carrying as necessary, would iterate N times, i.e. O(log n). If we count on fingers, having each iteration decrement b by 1 and increment a by 1 until b reaches 0, then we get an O(n) algorithm. If this makes your head spin then this O(n) algorithm has succeeded in running circles around you. But has it run circles around another algorithm, no, so let’s try again.

Someone’s frequent blog’n gets read every day. Someone else’s b’n doesn’t. But wait a minute, then the log wins again.

I got it! Cut down a tree using n circular saws. The log will be beaten to death.

"Oh, I know answers to these. Who hasn’t learnt that in schools, was either in the school before processor caches existed, or was in bad school."

Even ignoring the effect of caches, linear sequential memory access are faster than random accesses.

When I was an undergrad, I remember some excitement in the math department at my school because they discovered a polynomial time algorithm for calculating something or other (I forget what). It was still slower than the existing exponential time algorithm for any reasonable N though. I think ths speed was checked up to N = 2^50 or so.

I think there will be video+slides available on the net. Make it so.

I think the topic O(n) vs. O(log n) is the coolest most amazing topic ever existed in the history of mankind. Raymond can you make the video of your talk available on the Internet?

I’d love to see a video from this Talk on Channel 9.

Should I sign some petition ?

Raymond, will you post some of that info for those of us who won’t be able to attend no matter how interesting the topics are? I would really like to see your views on the second item ("Consequences of the way CPUs work").

I can’t attend but I would vote:

"Consequences of the way CPUs work"

if I were going. If you are going to make the talk available online or if Channel9 would agree to film it and make it available I would really, really, really love that!

I would love to see Raymond on channel9 but I hear he won’t do video interviews=( I hope he and Dave C. have a change of heart!

I don’t like making public appearances, especially pure publicity ploys like an interview. Only reluctantly do I appear at conferences – This will be my first in 8 years.

big o:

> It has nothing to do with the total runtime (these algorithms are a function of the input size, not running time).

The O-notation relates the input size to the running time. That’s all it does.

> Here’s a question for you. Radix sort is a sorting algorithm that runs in O(n) time. That’s faster than every comparison-based sort. Why don’t we use radix sort for everything?

Apart from that you can’t radix sort arbitrary objects, it only runs in O(n) when you make assumptions about the range of the input elements, that real sorting algorithms don’t.

For example, I can write an O(1) algorithm if I know my input consists of unique 32-bits integers, because then the total runtime is bounded by a constant no matter what algorithm I use. But I can’t use that algorithm for sorting objects (or even numbers) in general, just like I can’t with radix sort.

mgrier:

> Usually it’s about comparisons; btrees don’t win because they have a better O() description. They win because they do a good job of managing the number of I/Os that need to be done.

You say it like those two are different things, but they aren’t: a btree does less I/O BECAUSE it has a better runtime complexity!

I, of course, meant why wouldn’t you always use radix sort for the subset of problems where you can choose between both a radix sort and a comparison-based sort. The answer is because radix sort has a number of hidden O(1) passes that depend on your choice of radix. Despite being a linear running time, these extra constant-time passes contribute significant overhead that the comparison-based sorts don’t have.

This example shows that you can make a more informed decision than just big-o notation when you choose your algorithms. O(lg n) is not always faster than O(n).

Oh, nonsense. You just didn’t pay attention in school, that’s all. This so called "real world" stuff is what is inconsequential, inconstant and ultimately uninteresting.

i’d definitely attend these, especially the second one about algorithmic complexity & cpus.

Bullets 1 and 2 would make me go

> I’d like to see all those practical applications where > O(N) algorithms outperform their O(log N) algorithms. > I can imagine it might happen if the total runtime is > very small, but that doesn’t count as a scenario where > performance matters.

As mentioned earlier, this can happen fortwo different reasons: if the constants in the complexity are large, or because of cache and paging issues. The second one has been adequately dealt with, so I’ll touch on the first.

The canonical example of a problem where an algorithm with better big-O complexity loses to one with worse, is polynomial multiplication. The "naive" algorithm for it is O(n^2), but you can get an O(n log n) algorithm using Fast Fourier Transforms.

However, the constants in the FFT algorithm’s complexity are so large that no one outside of undergrad classes in algorithmic complexity and very specialized scientific computing packages use it, because crossover (the point at which the O(n log n) algorithm becomes faster) isn’t until you start dealing with stupendously high-degree polynomials.

A few years ago,

I told a story of how Marketing messed up a bunch of PDC slides

by "helpfully" expanding…

Faking your way through a German transaction.