Moore’s Law Is Dead, Long Live Moore’s law


Herb Sutter has an insightful article that will be published in Dr. Dobb’s in March, but he’s been given permission to post it to the web ahead of time.  IMHO, it’s an absolute must-read.

In it, he points out that developers will no longer be able to count on the fact that CPUs are getting faster to cover their performance issues.  In the past, it was ok to have slow algorithms or bloated code in your application because CPUs got exponentially faster – if you app was sluggish on a 2GHz PIII, you didn’t have to worry, the 3GHz machines would be out soon, and they’d be able to run your code just fine.

Unfortunately, this is no longer the case – the CPU manufacturers have hit a wall, and are (for the foreseeable future) unable to make faster processors.

What does this mean?  It means that (as Herb says) the free lunch is over. Intel (and AMD) isn’t going to be able to fix your app’s performance problems, you’ve got to fall back on solid engineering – smart and efficient design, extensive performance analysis and tuning.

It means that using STL or other large template libraries in your code may no longer be acceptable, because they hide complexity.

It means that you’ve got to understand what every line of code is doing in your application, at the assembly language level.

It means that you need to investigate to discover if there is inherent parallelism in your application that you can exploit.  As Herb points out, CPU manufacturers are responding to the CPU performance wall by adding more CPU cores – this increases overall processor power, but if your application isn’t designed to take advantage of it, it won’t get any faster.

Much as the financial world enjoyed a 20 year bull market that recently ended (ok, it ended in 1999), the software engineering world enjoyed a 20 year long holiday that is about to end. 

The good news is that some things are still improving – memory bandwidth continues to improve, hard disks are continuing to get larger (but not faster).  CPU manufacturers are going to continue to add more L1 cache to their CPUs, and they’re likely to continue to improve.

Compiler writers are also getting smarter – they’re building better and better optimizers, which can do some really quite clever analysis of your code to detect parallelisms that you didn’t realize were there.  Extensions like OpenMP (in VS 2005) also help to improve this.

But the bottom line is that the bubble has popped and now it’s time to pay the piper (I’m REALLY mixing metaphors today).  CPU’s aren’t going to be getting any faster anytime soon, and we’re all going to have to deal with it.

This posting is provided “AS IS” with no warranties, and confers no rights.

Comments (40)

  1. Anonymous says:

    So if CPUs have hit the wall in terms of performance, aren’t we just going to see scalibility outwards with multi-CPU systems becoming more common place? (with the OS abstracting away multi-cpu issues)

    I can possibly see this as a problem for a small fraction of developers out there (game developers, scientific app devs, imaging developers, etc) where performance is crucial, but will this really affect internal corporate developers writing web apps or developers writing standalone business software?

  2. Anonymous says:

    Ryan: Absolutely – that’s the point of Herb’s article.

    But multiple CPU cores will only help so much, and only if your application can take advantage of them (by either being multithreaded or by using OpenMP to expose finer grained paralellisms in your application).

  3. Anonymous says:

    "It means that using STL or other large template libraries in your code may no longer be acceptable, because they hide complexity.

    It means that you’ve got to understand what every line of code is doing in your application, at the assembly language level."

    So no more .net Framework then? 😉

  4. Anonymous says:

    If your app’s written to the .Net framework today, and you’re happy with its performance, then there’s no reason not to continue to use it.

    If you write new code, you should write to the .Net framework, because of the improved security/managibility (especially if you’re deploying network facing applications).

    But if you’re going to switch from unmanaged code to managed code, you can’t count on Moore’s law getting you out of the 5%-10% slowdown you’re going to get by going to managed code.

    And you need to be extra careful of your performance. You need to understand the performance ramifications of the various collection containers (System.Collections.Array, System.Collections.HashTable). Managed code is much easier to use, but part of the reason for its ease of use is that complexity is hidden – it’s really easy to write managed applications that perform poorly.

  5. Anonymous says:

    I think you’re absolutely right for systems developers and the people Ryan mention: people on the edge of the performance envelope. For everybody else (the vast majority of developers) I’m not sure I see how it matters all that much.

  6. Anonymous says:

    mschaef, you might be right.

    But the only kind of developer that I can think of whose code won’t end up on the bleading edge of the performance envelope are hobbyest developers and those who deal with a very limited set of customers.

    If you’re writing asp.net applications, then what happens when your asp.net application gets /.’ed?

    Remember the IBM ads where a small web company goes live and the product orders start coming in? They got 10, then 20, then 100, then 1000, then tens of thousands of orders. The IBM add was basically about whether or not your web services platform could handle unexpected new traffic. The implication in those ads was that if you went with IBM you wouldn’t have this problem – but their point is still valid. If you’ve got a CPU bottleneck in your web application, you won’t be able to buy a faster box to run it on.

  7. Anonymous says:

    Larry…

    "If you’ve got a CPU bottleneck in your web application, you won’t be able to buy a faster box to run it on."

    Luckily web applications are highly parallel and are multi-threaded by nature (at least every technology I’ve used). Each request can run on it’s own thread. Multi-cpu servers will allow you to scale out this type of application. Heck, most web large web applications run just fine in server clusters.

    That said, I agree with the premise of the article. I’ve noticed that since 2003 desktop CPUs have not gotten any faster (and it looks like it will be that way for 2005). Intel has recently stated that in the next few years, we’ll see new CPUs that have 10x the performance of current products. I’ll beleive it when I see it. I wonder how long they can keep selling the same P4s with different product numbers?

  8. Anonymous says:

    Either we can revert back to programming in pseudo-assembly, or we could let the machines do the work for us instead. I’m voting for declaring shared state concurrency a bankrupt idea and putting our efforts into developing more modern languages.

    There is an interesting <a href="http://lambda-the-ultimate.org/node/view/458">discussion</a&gt; on the exact same topic over on Lambda-the-Ultimate.

  9. Anonymous says:

    I don’t think we’ve got to go back to programming in assembly.

    But we DO need to understand the consequences of our code.

    Do you use System.StringBuilder to concatenate strings or System.String.operator+=? It’s actually not clear which is more efficient – for some string values, StringBuilder’s more efficient, for other operator+= is.

  10. Anonymous says:

    "But the only kind of developer that I can think of whose code won’t end up on the bleading edge of the performance envelope are hobbyest developers"

    Well, it seems pretty clear (based on a simple perusal of software store aisles, and custom projects I’m familar with) that it’s possible to do interesting and commercially viable software on systems with significantly "obsolete" hardware. For software that does need cutting edge performance I’d expect that most projects do things like leverage existing engines, and thereby leave the truly heavy lifting to the specialists.

    "and those who deal with a very limited set of customers. "

    Compared to a developer on Windows, that’s pretty much everybody. 😉

    "But we DO need to understand the consequences of our code. "

    I’m not advocating ignoring the consequences of our code, just making the statement that modern machines are powerful enough that naive approaches can be suprisingly effective (and cheaper to implement).

  11. Anonymous says:

    mschaef, you’re right, modern machines ARE fast enough.

    But the key takeaway (IMHO) is that the days of assuming that we can ship bloatware and rely on Moore’s Law to cover our mistakes are over – machines aren’t going to be getting significantly faster (without making source code changes), and that means that a lot of apps may be burned. For example, the Microsoft Picture It! team won’t be able to ship a pokey version assuming that in the near future machines will be fast enough to run their code well (I’m picking on the PictureIt team here, it’s actually pretty fast on my machine at home).

  12. Anonymous says:

    This is a huge counterargument against managed code. The working set of managed code is huge, STL and majority of C++ features are blazingly fast as compared to the penalties incurred due to .Net runtime.

    Is Herb Sutter shooting down his own case for the managed code here?

  13. Anonymous says:

    Amit,

    I actually disagree with it being an argument against managed code. Working set isn’t the issue here, Herb’s not talking about memory bandwidth, he’s talking about CPU bandwidth. Memory bandwidth is likely to continue to improve (and to be more important as time goes on).

    And it’s possible to write high performance ASP.NET applications (it’s also possible to write poorly performing ASP.NET applications).

  14. Anonymous says:

    I don’t quite get the argument. If my applications can’t run on current hardware, I’m dead in the water. I can’t wait for the next CPU.

    And this… "Concurrency is the next major revolution in how we write software." Uh, call me crazy, but isn’t that every Web app made today? It sure would suck if our apps were only handling one request at a time!

  15. Anonymous says:

    Jeff,

    I think I’d like to answer that question tomorrow in more depth tomorrow.

    But the simple answer is that people have "known" for the past 20 years that if their app was just tolerable on the current generation of hardware that all they’d have to do is to wait for the next generation and it’d work just fine. So they’d ship apps that were pokey on current hardware because they knew that it’d be better on new hardware.

    The thing is that that assumption is no longer true.

    The world isn’t just web apps – and even web apps often have shared state that limits concurrency.

  16. Anonymous says:

    I think we’ve got bigger fish to fry than inefficient code… Look at nearly every protocol which is based around ASCII-7 text instead of something more efficient – like binary protocols.

    Sure, ASCII protocols are great for debugging, but there’s a LOT of waste in there that could be eliminated by switching to a simple binary protocol.

  17. Anonymous says:

    Simon,

    You’re right, but typically the reasons for supporting text based over binary protocols tend to be pretty religious – there are strong arguments for both forms.

  18. Anonymous says:

    I disagree that hard disks aren’t going to get faster. In the last five years we’ve seen IDE hard disks go from 66MHz to 150MHz bus connection (the latter on Serial ATA), 0.5MB to 8MB cache size, and 4500 to 7200rpm spindle speed. The top end remains in the hands of SCSI, with the maximums currently 640MB/s bus speed (although 320MB/s is more common) and 15000rpm spindle speed.

    What will remain true, at least for the immediate future, is that I/O speed will be far slower than memory accesses. You still have to design for low working set. Remember: in whatever language or environment, the easiest way to reduce memory management overhead is to allocate less memory.

  19. Anonymous says:

    Larry,

    I am willing to take a USD $50 bet with you that processors will reach 4 GHZ before the end of 2006. It is completely nonsense that the GHZ ramp is completely over; it will just slow down. So there will be a combination of multi-cores per CPU, plus a more gradual increase in MHz.

    It is easy to predict things will slow down a little as intel transitions to Desktop chips built from the Pentium M designed core; but this is just sensationalist to suggest the ramp is over.

    It is hard to be precise about a bet because of this Pentium M transition; but there are many ways I am happy to bet.

    For example, I am also happy to bet that the latest core derived from Pentium M in Jan 1st 2007 will be at least 30% faster than the fastest Pentium M core in Jan 2005 (ie at least a 30% increase in pure MHZ in 2 years).

    Let me know if you are willing?

  20. Anonymous says:

    David,

    The thing is that the RATE of performance improvements isn’t going to continue.

    While we’ll probably see 4GHz cores by 2006, if we were continuing on the curve that’s been present for the past 20 years, we’d be at 10GHz machines right now.

    The days of exponential growth in CPU speed are gone.

  21. Anonymous says:

    Moore’s law doesn’t say anything about the speed of processors. It says chip density doubles every eighteen months. It’s not the same thing.

  22. Anonymous says:

    Oracle built a whole industry around abusing Moore’s law.

    One of the nice things (to me, a laptop user and quiet pc freak) is how much effort is being thrown into making chips, drives, and such cooler, quieter, and lower power. Pentium M is an incredible push forward in that, the first time a truly mobile-designed chip rivals the raw power of a large chip, but it’s basically a reengineered P4, and the expected gains from its designs aren’t really much more than the expected end-term gains from P4 were before r&d on it was canceled.

    As religious as I am about P-M, I have to admit that games have been PCI/AGP-bound for years, and thankfully PCI-X is coming along just as gigabit ethernet threatened to overwhelm PCI.

    Dual-core is a real shifty point for me. A lot of code is going to have to be wildly reengineered to be massively thread-safe, let alone SMP-performant. Compilers are just catching up to serious pipelining optimization, and are still trudging through SMP.

    I am absolutely convinced that intel’s research, along with various universities’, is going to lead to a breakthrough that will lead to viable new materials and powerhouse chip designs a few years later. But will that come next year? 5 years from now? 10? Until then we have to keep squeezing blood out of a turnip with better coding practices (yet more secure), optimizing holisticly (no more days of just making inner loops faster), and convincing managers that the extra time is justified.

    No more Office Chrome++ releases, I hope this means. To say nothing of Mozilla, bloatware standard bearer of the open source world.

  23. Anonymous says:

    Bob, I know that. So does Herb Sutter. Read the linked post. The critical thing is that until recently, the increase in number of transistors has corresponded to the increase in CPU speed (if only because of speed of light issues – the transistors are closer, thus propogation time is smaller).

    That is no longer going to be the case.

    foxyshadis: You may be right – that’s why I called it a wall – for now, it’s immovable, but that may change.

  24. Anonymous says:

    We could just wait and see if some of the "waiting in the wings" technologies pan out.

    I’m talking about things like silicon on sapphire ICs going into widespread production instead of just being for space/military application… and more importantly, true 3D circuits on silicon instead of the 2.5D ones we have now (ie. a few different substrate layers, but only a few – not truly flexible 3D).

    Of course this will all get easier once we have nanotech around to make the circuits element by element instead of having to deposit them a layer at a time.

    Once you have effective nanotech, you can start doing things like spin junctions, and switch to using photons for everything instead of electrons (because a signal sent as light only requires the energy for the photon; you don’t have to deal with charge pumping and propogation errors due to thermal noise between electrons). You can also create transistors on a scale much smaller than you can do using masks and electron beams; you don’t need to focus if you can get right in there with a nano-scale hammer and nail and assemble it a piece at a time.

    I think we’ve just hit a speed bump – not an actual break.

    Besides, if it gets really bad, there’s still quantum and biological computing to go. I’m all for a computer which doesn’t need 400W – but occasionally complains if it doesn’t get a Mars Bar.

  25. Anonymous says:

    foxyshadis:

    Actually the Pentium M is far closer to the Pentium III than the P4, which is nice because it doesn’t inherit the P4’s abysmal IPC performance. And I wouldn’t say that games are limited by the bus as much as the graphics card itself, unless your card has too little RAM and you’re pushing all your textures across the bus all the time. The change from AGP4x to AGP8x didn’t yield incredibly tangible performance increases and I don’t suspect that PCI-X will speed things up until we’re a couple more generations ahead with far more data being transferred to/from the card.

    I also suspect the trend of flashier and flashier UIs isn’t going to stop. Just that the work of rendering it will get offloaded more and more onto the GPU. (Isn’t that what avalon is doing?)

    Your sentiment on the Office chrome is shared, though. That’s just way too much rounded-cornered shading overload. A little part of me dies every time I see some other application trying to copy that look&feel.

  26. Anonymous says:

    >> Besides, if it gets really bad, there’s still quantum and biological computing to go.

    Don’t ask me why, but the idea of Biological computing just seems to be creepy sci-fi stuff, not real. Quantum-based computing is quite interesting, I still have to wonder how, exactly it will work.

    I’d imagine that at some point PCs will be comprised of several special purpose processors for most of the common computing tasks (graphics, sound/dsp ,data compression(?)) and a general purpose processor for, well.. most application use. There is still a lot of parallelism in that archetecture, but will free the CPU itself from things such as processing and decoding a sound file, performing display functions and updating the display to do other things like update spreadsheets all the way to running complex models (some of which could potentially be offloaded to the DSPs, SPUs or GPUs…)

    I think we’re already sort of heading in that direction, but GPUs are not generally very easy to get to perform tasks that don’t go directly to the screen ATM. It may take an entirely different approach to finally get there.

  27. Anonymous says:

    I think that for a short time more developers will have to deal with concurrency issues. Shortly after that, there will be a few frameworks developed that manage most of the complexity for 90% of developers and do a better job then most could do on their own. Something along the lines of a shared state manager with a message passing system.

    If that is the case most developers will end up working with a system that hides more complexity from the developer then we have now. As other posters have brought up, in web development, most concurrency issues are already solved by the web server. Most of the time that I’ve tested doing explicit threading for handling web requests, the throughput has gone down. 2 syncronized threads per web request will usually not scale as well as 1 thread per request with fewer syncronization requirements.

  28. Anonymous says:

    Sorry, I meant reengineered P3. Thanks.

    The last thing I want is my processor fan running while my machine is idling. Ugh. Sun and MS can keep their 3d OSes.

  29. Anonymous says:

    Mike, isn’t this called the Wheel of something? Going back and forth between combinations of generic, centralized, specialized, and distributed is a neverending cycle in the computing world. Eventually someone notices that all those specialized chips are duplicating functionality and folds them into one for cost and power savings, then they start creeping out again as the single-chip limitations are reached.

    (This is also currently playing out in the athlon64’s integrated northbridge.)

  30. Anonymous says:

    Larry,

    thanks alot for pointing to herbs article. most intrestring read for months.

    WM_THX

    thomas woelfer

  31. Anonymous says:

    Hi Larry, Belated Merry Christmas, Happy New year, haven’t been posting took the last 3 weeks off, Brain Break just spent time with Family, Holidays. Watching Movies, Enjoying life. Never touched a computer that whole time. First time in probably 20 years I have spent more than a day away from computers. Funny how wierd life is without it, unuplugged, believe it or not never even knew anything about the Tsunami until I returned to work and plugged back into the world.

    Anyway, on to my comment, kind of along the lines of what Mike Hall Posted recently http://blogs.msdn.com/mikehall/archive/2004/12/29/344100.aspx

    I think this is going to be more of a problem for those that entered programming in the last 5 years or so. The ones where this was just a way of life. The ones where the never had to really look under the hood and really understand what all that was going on. Which I tell you under things like .net you got to really dig hard to get to that level. Even if you were to get there. I know sometimes I have got lost tracking it all the way back to some windows api.

    Anyway for those that programmed against like the 386 or the 486 and processors like that, is this really going to be a problem? Another thing will this be more of a problem for the Upper Management of these companies, the guys that say I want this app to do this and you know that there is no way it can be done without some performance hit. Yet they do not understand. Some of the managers have trouble remembering thier password.

  32. Anonymous says:

    Thanks Jeff,

    You have a good point. And you may be very right.

    Joel’s current post in JoS is somewhat to the point: http://joelonsoftware.com/articles/CollegeAdvice.html

    especially where he discusses learning C. The JoS book has another good essay on this matter too.

  33. Anonymous says:

    It’s about time that the programmers and application geeks applied their own brainpower to the code bloat problem and resolved performance issues on their own rather than let it slide into the compiled app.

    Application managers need to really consider whether a "feature" that only 1% of the user base wants is worth the bloat and performance hit it brings to the final product.

    I, as a consumer of software products, would rather see one GOOD release per year/18 months (that is well tested, secure, and performanced tweaked), rather than a shoddy initial release and upgrade packages every 4 months for the first year.

  34. Anonymous says:

    John,

    The "code bloat" issue has been written about before. The best explanation I’ve heard for code bloat is: "People only use 50% of the app’s features. The problem is that no two users use the same set of features – every one of those "bloated" features is used by some significant percentage of the people who use the product". Features don’t come out of thin air, they’re added to the product because some set of customers wanted it.

    Joel Spolsky talks about this a bit in his book Joel on Software, where he discusses the word count feature in a word processor – a major word processor came out without it, and it was universally panned by professional writers (the guys who write the reviews of the product) because it didn’t have a word count feature. This is a feature that 99% of the users of the product don’t need, but to 1% of the customers of the product (published writers), it’s literally a must-have feature.

  35. Anonymous says:

    Way back in 1997, Nathan Myhrvold (CTO of Microsoft at the time) wrote a paper entitled &quot;The Next Fifty…