Why CPUs aren't getting any faster (2010)

delroth · on April 28, 2014

CPUs are getting faster. Sandy Bridge is a 15-20% IPC improvement on Nehalem for some heavily integer and memory access based workloads. On the same workloads, Haswell is another 15-20% IPC improvement on Sandy Bridge.

I work on the Dolphin Emulator (https://dolphin-emu.org/) which is a very CPU intensive program (emulates a 730MHz PowerPC core, plus a GPU, plus a DSP, all of that in realtime). We try and track CPU improvements to provide our users with proper recommandations on what hardware to go for. Here are the results of a CPU benchmark based on our software: https://docs.google.com/spreadsheet/ccc?key=0AunYlOAfGABxdFQ...

twoodfin · on April 28, 2014

Given that we seem to be hitting the practical limit on CPU frequencies, it's interesting to speculate on what kind of (single thread) IPC headroom is still available. You can model those IPC limitations by assuming a CPU with unlimited registers, unlimited functional units, a perfect branch predictor, etc. and seeing how much ILP you can extract from benchmark code. Tighten the CPU's constraints and you get a sense of which capabilities are most valuable.

Wall 1991[1] is the classic paper, AFAIK, and saw a limit with "quite ambitious" (for 1991) imagined future hardware at around 5-7 IPC in a 'median' benchmark. I'd be interested to hear how close Intel is getting and whether anyone's updated that result.

[1] http://www.eecs.harvard.edu/~dbrooks/cs146-spring2004/wall-i...

trustfundbaby · on April 28, 2014

The article doesn't go into it in depth, but I think the answer is that chip makers hit a wall somewhere over 3Ghz range where it became difficult to ramp up cpu frequency without spending ridiculous sums on cooling the processor so it could operate properly (you'll notice that even now, the fastest intel chips come in at the 3.1-3.2 Ghz range ... theres a reason for that)

I was big into building computers during the cpu race between AMD/Intel back in the late 90's early 2000's and the Intel Pentium 4 processor line is notable for pushing the envelope from the the high 2Ghz range all the way up to the 3.4Ghz and 3.6ghz (I still have a 3.4Ghz chip sitting in my home office ... those were the days!)

Wikipedia does a great job of chronicling what happened with the Pentium 4 line here http://en.wikipedia.org/wiki/Pentium_4 with hints to what I've just alluded to above

"Overclocking early stepping Northwood cores yielded a startling phenomenon. While core voltage approaching 1.7 V and above would often allow substantial additional gains in overclocking headroom, the processor would slowly (over several months or even weeks) become more unstable over time with a degradation in maximum stable clock speed before dying and becoming totally unusable"

It was after their failures with the brute force attempt at higher cpu cycles that Intel finally went a different way with initially the Pentium M line (code named Dothan and Banias) http://en.wikipedia.org/wiki/Pentium_M and eventually the Core duo/Core series that they've since built on.

sliverstorm · on April 28, 2014

the processor would slowly (over several months or even weeks) become more unstable over time with a degradation in maximum stable clock speed before dying and becoming totally unusable

This is an electromigration[0] and/or hot carrier injection[1] problem. Both are exacerbated by temperature and voltage, but are not the reason chips are stuck at 3-4GHz.

One actual limiter is pipeline depth. Deeper pipelines is an easy way to enable an increase in frequency. The problem in a nutshell- the longer the pipeline, the greater the overhead & the larger the penalty of branch misses.

[0] http://en.wikipedia.org/wiki/Electromigration

[1] http://en.wikipedia.org/wiki/Hot-carrier_injection

on April 28, 2014

[deleted]

habitue · on April 28, 2014

But comparing the same processor, higher clock speed means faster processing. And, to a lesser extent but still usefully, clock speed is a good indication of relative speed in the same processor line.

Let's not totally throw the baby out with the bathwater, just because the processor speed on different CPU organizations is not 100% equivalent. It's very often times a good proxy.

on April 28, 2014

[deleted]

jl6 · on April 28, 2014

> Using clock speed as a simple way to determine which CPU is faster than another is just plain wrong.

But neither of the posts you've replied to in this subthread have recommended that.

sliverstorm · on April 28, 2014

He's not comparing performance, he is pointing out that frequency has been fairly stagnant since the very early 2000's

josho · on April 28, 2014

If not clock rate, then what measurement?

CPUs are sold by clock rate and processor type, e.g an i5 at 3Ghz. So, yes you are technically correct, but clock speed has been the practical way that consumers know to measure speed.

on April 28, 2014

[deleted]

nirvdrum · on April 28, 2014

True. But how many people run a single application at a time on a desktop?

personZ · on April 28, 2014

Your haughty response is so incredibly out of place that I have to think that you replied to the wrong post or something. But there you have quotes from the parent post, so you must simply have a trigger finger for trying to point out that someone is wrong.

But they aren't. There is nothing they said that is wrong in any way. Indeed, you continued this with replies to various other people where again your replies barely relate to what they are saying.

What is your motive?

zokier · on April 28, 2014

> you'll notice that even now, the fastest intel chips come in at the 3.1-3.2 Ghz range

while this nitpick does not change the overall argument, Intels high-end has hovered in 3.5-3.7 GHz area for some time now, with "turbo" hitting 4 GHz.

higherpurpose · on April 28, 2014

Interesting that the article mentions SNB, because since then the gains in performance have been much smaller. SNB was the last "significant" gain in performance for Intel CPU's I'd say (+35 percent over previous generation). All of the new generations since then have gotten like 10 percent increase in IPC, at best, and Broadwell will probably get a max gain of +5 percent.

To "hide" this, Intel has refocused its marketing on power consumption, where gains seem easier to achieve (for now), as well as other pure marketing tricks such as calling what used to be "Turbo Boost speed", the "normal speed". I've noticed for example recently a Bay Trail laptop being marketed at "2 Ghz", even though Bay Trail's base speed is much lower than that.

zwegner · on April 28, 2014

Depends on the application, of course. Haswell introduced some really cool new instructions (both the Bit Manipulation set as well as the Software Transactional Memory set). Don't have any numbers off-hand, but these can definitely speed up certain workloads.

supergauntlet · on April 28, 2014

Apparently in specific workloads the move from AVX -> AVX2 was very substantial, though I can't seem to find the article that showed this.

vardump · on April 29, 2014

The transition is more like from SSE2 -> AVX2, in integer workloads. AVX2 adds 256-bit wide integer instructions and gather-instruction. Gather does multiple loads in one instruction.

AVX was just about floating point.

Guvante · on April 28, 2014

> as well as other pure marketing tricks such as calling what used to be "Turbo Boost speed", the "normal speed"

I always found calling it turbo boost speed odd. During the majority of applications that exist today that is the speed you will get.

higherpurpose · on April 29, 2014

The point of Turbo Boost speed is that it's unsustainable, therefore you may only get it for short flashes of time. If it wasn't unsustainable, that would be the real speed of the chip. But they can't make these chips at 3 Ghz, and also claim 10W of power consumption.

By using Turbo Boost, they have their cake and eat it, too. They can do well in benchmarks thanks to Turbo-Boost, but also do well in power consumption tests, because most of the time the chip will not be running at Turbo Boost speed, and will be throttled to "normal speed" as soon as it passes its power consumption limit.

So Intel is being misleading by using Turbo Boost speed as the normal speed.

Guvante · on April 29, 2014

Ah, I assumed the Turbo Boost was just due to the fact that if you are using all four cores you can't clock yourself quite as high.

Spittie · on April 28, 2014

I personally have my own idea for that: Sure, we have hit many walls, but I don't think that's the main reason for the slow down in CPU development. I think it's mostly because the R&D moved from making the CPU faster for making the CPU consume less, to follow the Laptop/Mobile market (as everyone loves/hates to say, every year is the year of the death of the PC).

Also we're at the point where most of the very-requesting software isn't bottlenecked by the CPU, or where you can just throw more cores at the problem and solve it. And also software is starting to leverage GPU acceleration, which gives an huge boost when usable. And GPUs are getting a lot faster every new generation.

logicallee · on April 28, 2014

This is why CPUs aren't getting any faster:

https://www.google.com/search?q=c+%2F+5+ghz

morenoh149 · on April 28, 2014

care to explain? I'm curious what you mean by that google search

michael_nielsen · on April 28, 2014

It means signals can only propagate about 6 centimeters during a single CPU clock cycle. That makes accessing information off-chip difficult in a single cycle, since 6 centimeters is on the same order as the linear scale of most motherboards.

This limit doesn't immediately impact computations that are carried out entirely on-chip. I expect that's the great majority of computations on today's CPUs, which typically have large caches. So I'm not sure the GP's explanation is correct, although it's certainly interesting.

lotsofmangos · on April 28, 2014

As an explanation for why you cant get clock speed increases of the same order of magnitude as you could previously, without changing the entire technology, it would seem correct.

6cm now means that you only need to double twice to get to less than the die width.

Also, electricity does not propagate through copper at c, but at between 40% and 90% of it, depending on the inductance and capacitance of the circuit, so you are looking at even less of an increase before you start running into some really thorny problems as not only are you running up against relativistic limitations, but different bits of your chip can be doing it at noticeably different times.

frakturfreund · on April 28, 2014

Nothing can't travel faster than the speed of light – including Electric current. So in a hypothetic 5 gigahertz CPU, there couldn't be a path through the wires/transistors longer than 5.99584916 – otherwise, the end result couldn't be correctly determined. That's why CPU's got smaller and smaller. But since the transistor shrinking slows down, and Adders/Multiplyer/Register/Decoding Logic etc. all takes some time and length, it's harder to make a CPU with a shorter longest path.

Qantourisc · on April 28, 2014

This is actually less of an issue as you think it is. You can use asynchronous communication. But leaking 5GHz outside your cpu is a BAD idea.

zokier · on April 28, 2014

The power wall theory is bit odd though. Why are modern Intel desktop CPUs limited to so low power budgets? Ivy bridges were just 77W (TDP), and now Haswells are apparently 65-84W. Desktop platforms should be able to handle far more power, at least in the 100-150 watt range. Meanwhile desktop GPUs are hitting 200-300 watt TDPs regularly, with far more limited cooling systems.

Why isn't Intel able (or willing) to push the power envelope higher in desktops?

wmf · on April 28, 2014

Maybe because Intel "desktop" processors are just laptop processors in slightly different packaging. Or maybe they want to push more customers into the more expensive -E segment.

wtallis · on April 28, 2014

But even the enterprise parts only have higher TDPs due to higher core counts - the cores themselves are the same as the consumer desktop parts, with a bit more cache thrown in and some transceivers to help scalability to even higher core counts that we don't want. Nobody is trying to design the fastest quad-core processor, let alone dual-core or single-core. No one is willing to commercialize a chip that is 10% faster at single-threaded tasks when it will be a fourth the speed at highly-threaded tasks, especially when the processor can self-overclock one core when it's the only one in use, thereby offsetting some of the core-count tradeoff.

awalton · on April 29, 2014

They're different chips - the laptop chips get much better perf/watt than the desktop components which are comparatively sloppy. That power management hardware is actually somewhat expensive technologically.

It really has more to do with the customers. The people buying these chips by the thousands are Enterprise Desktops, and keeping power bills down is their #1 goal. The customers don't buy GPUs in these machines either, they expect Intel's to be "good enough."

Unfortunately, consumer desktop purchases are basically a rounding error to Intel's multibillion dollar balance sheets, so they don't build parts for that segment. Instead, they blow some fuses on their enterprise parts and sell them as "Extreme Performance", "Enthusiast Desktop" or whatever other euphemism they want to use for "overclockable" these days.

bluedino · on April 28, 2014

Additions to the instruction set can help out where raw GHz don't get things done.

Another big improvement has been moving certain functions to hardware - Intel's Quicksync is a great example of this.

carsongross · on April 28, 2014

The problem is that these advancements are one-offs and are increasingly difficult to both create and adopt. There was a great paper I read on compiler optimizations (I was a language guy in college) that pointed out that compiler optimizations were responsible for maybe a 2x improvement in program speeds, whereas CPU improvements were responsible for 1000x.

Almost all the singularity-type techno-optimism was based on a curve drawn during the amazingly magical part of CPU development: shrink the die, double the speed.

We are now past that, although many techno-optimists are casting around for a reason it will continue.

tim333 · on April 28, 2014

One of the interesting things in the Kurzweil type techno-optimism stuff is that the calculations/sec you get for $1000 seems to have been doubling regularly since 1900 or so long predating Moore's law and the phenomena seems largely independent on which particular technology is used to implement it. (see https://en.wikipedia.org/wiki/File:PPTMooresLawai.jpg). I figure it must be largely an economic phenomenon in that it's worth while for the manufacturers to invest to be able to say 'you should buy next year's model - it's 50% faster than this years.' These days the technical implementation seems to be more cores - there seems no absolute limit on how many parallel processors you can have.

Guvante · on April 28, 2014

Honestly if we can get multi-core programming working we can probably keep it going.

Although that is an enormous if.

carsongross · on April 28, 2014

Most problems are not trivially parallelizable and, in any event, Amdahl curves look a lot different than the exponential curves used to argue for The Singularity.

We may have to make due with what we've got.

Which, thankfully, is quite a lot.

im3w1l · on April 28, 2014

>Most problems are not trivially parallelizable Numeric optimization can benefit from parallel line search, and threads using different initial points. Matrix algebra benefits from more cores. Sorting benefits from more cores. Encryption benefits from more cores (De)compression benefits from more cores. A server handling simultaneous requests benefits from more cores. Cracking hashes benefits from more cores Computing merkle trees benefits from more cores Flash banners in 20 tabs "benefits" from more cores

marcosdumay · on April 28, 2014

Well, our brain has a single thread performance that is already many orders of magnitude smaller than our current CPUs. Any claim that Amdahl's law will forbid human-like inteligence must explain why that isn't a problem to us.

Or, like you said, yep, we do have a lot.

neona · on April 28, 2014

I hope we see an increase in real software parallelism, since that's the only real way out of this for the foreseeable future. Tacking on more cores is still an option we have, we're just having trouble using them right now in many contexts.

In the longer term, we'll hopefully see advancements that let us fundamentally change how logic processors are constructed, such as possibly photonic logic chips. Only a major shift will let us break through the current single-thread performance wall.

AshleysBrain · on April 28, 2014

New architectures like the Mill (http://millcomputing.com/) could provide alternative ways to a breakthrough increase in performance.

On the software side, I've always understood browsers are pretty good at parallelism, which is a pretty major platform that gets performance benefits. That could also extend even further with projects like Mozilla's Servo (https://github.com/mozilla/servo), a browser engine built from the ground up with parallelism in mind.

willvarfar · on April 28, 2014

Yet JavaScript is fundamentally single threaded by design...

AshleysBrain · on April 28, 2014

I don't think that's actually true. The way it use asynchronous callbacks allows for lots of parallelism. For example creating a new image will cause it to download the image off the network and decompress it in parallel to executing Javascript, and then when done it fires the 'onload' handler. That's much more parallel-by-design than something like C++.

pcwalton · on April 28, 2014

Unfortunately you can't actually run those callbacks in parallel, because of JavaScript's run to completion model.

The work we're doing with PJs, however, attempts to fix this problem :)

demallien · on April 28, 2014

Take a look at Chromium as an example. Every tab gets it's own process. The GPU gets it's own process. Every plugin object gets its own process. Page loading is done in a separate process. Web workers run in a separate process. Chromium will happily eat up every core your CPU has to offer.

pcwalton · on April 28, 2014

If you're running multiple tabs at the same time and actually interacting with both. But most of the time you're only laying out and rendering one page at a time; your browser actually only displays one tab at a time, after all.

th3iedkid · on April 28, 2014

Weren't there walls back in the 90s?I would rather bet on a new tech leap than to go by federated designs at this stage.

tedsanders · on April 28, 2014

Yes, there were walls back in the 90s and yes, we managed to break through them. However, the walls we face today look much nastier. I concede that there's no way to know the future, but nonetheless I'd put my money on a sharply slower rate of improvement for CPUs.

In the short term, it would be great if we had a breakthrough in x-ray power sources (for cheaper EUV lithography), a breakthrough in etch control (for vertical transistors), or a breakthrough in III-V materials (for a one-time mobility improvement).

But long term, we need to find an alternative to the transistor. Perhaps spintronics, perhaps non-von Neumann architectures, who knows. But there is no way that a paradigm-changing redesign will be competitive with silicon in the next 10 years. Silicon has a humongous advantage in manufacturing, supply chain, know-how, scaled production, etc. Even if we find a better technology (and we haven't) it may still take decades to get there.

zwegner · on April 28, 2014

The article doesn't really seem to answer the question the title says it does.

Of course there's the well-known reasons, nonlinearity of power vs frequency scaling, diminishing returns in hardware design, etc. But there are others that we don't hear so much about.

Hardware design is still in pretty nascent stage, technology-wise. The languages used (say SystemC or Verilog) offer very little high-level abstraction, and the simulation tools suck. Sections of the CPU are still typically designed in isolation in an ad-hoc way, using barely any measurements, and rarely on anything more than a few small kernels. Excel is about the most statistically advanced tool used in this. Of course, CPUs are hugely intertwined and complicated beasts, and the optimal values of parameters such as register file sizes, number of reservation stations, cache latency, decode width, whatever, are all interconnected. As long as design teams only focus on their one little portion of the chip, without any overarching goal of global optimization, we're leaving a ton of performance on the table.

And for that matter, so is software/compiler design. The software people have just been treating hardware as a fixed target they have no control over, trusting that it will keep improving. That makes us lazy, and our software becomes more and more slow, by design (The Great Moore's Law Compensator if you will, also known as https://en.wikipedia.org/wiki/Wirth%27s_law).

The same problems we see in hardware design, huge numbers of deeply intertwined parameters, also applies to software/compiler design. We're still writing in C++ for performance code, for chrissakes. And even beyond that, the parameters in software and hardware are deeply intertwined with each other. To optimize hardware parameters, you need to make lots of measurements of representative software workloads. But where do those come from, and how are they compiled? Compiler writers have the liberty to change the way code is compiled to optimize performance on a specific chip (even if this isn't done so much in practice). To get an actually representative measurement of hardware, these compiler changes need to be taken into account. Ideally, you'd be able to tune parameters at all layers of the stack, and design software and hardware together as one entity. That is, make a hardware change, then do lots of compiler changes to optimize for that particular hardware instantiation. This needs to be automated, easy to extend, and super-duper fast, to try all of the zillions of possibilities we're not touching at the moment. There's even "crazy" possibilities like moving functionality across the hardware/software barrier. Of course it's a difficult problem, but we've made almost zero progress on it.

Backwards compatibility is another reason. New instructions get added regularly, but those are only for cases where big gains are achieved in important workloads. For the most part, CPU designers want improvements that work without a recompile, because that's what most businesses/consumers want. One can envision a software ecosystem that this wouldn't be such a problem for, but instead we have people still running IE6/WinXP/etc. Software can move at a glacial pace, and hardware needs to accommodate it. But this of course also enables this awfully slow pace of software progress.

sliverstorm · on April 28, 2014

As long as design teams only focus on their one little portion of the chip, without any overarching goal of global optimization, we're leaving a ton of performance on the table

Parameters like register file size are not determined in design (except where simple physics dictates). That's an architecture problem, which is constrained by the quality of simulation capabilities (which you did mention).

Another interesting thought- the design of a CPU starts many years before it hits shelves. So to accurately model performance (to make good decisions), architects must simulate a CPU running the code of five years in the future! There are some things that stay constant- tight loops, high level branches... but what about multithreading, for example? How multithreaded will the software of 2019 be?

weland · on April 28, 2014

As an EE working as a, and with other programmers, I hear a lot of these arguments whenever the matter of "why CPUs aren't evolving" comes up. There's a lot of misconception here.

> Hardware design is still in pretty nascent stage, technology-wise. The languages used (say SystemC or Verilog) offer very little high-level abstraction

There are a lot of things people outside microelectronics miss in this:

* Much (most?) of the research invested into HDLs is not related to the languages themselves. Verilog and VHDL and SystemC are really good enough for the things they describe. Really, they are; they may not look like much in terms of "high-level abstraction", but when you need to understand the relation between what you write and what gets done on silicon, that's actually a feature. It's turning Verilog code into silicon that's actually challenging and where a lot of effort goes. The benefits of being able to describe something in a functional manner and 10% fewer lines of "code" are belittled by the benefits of getting better output from a Verilog compiler.

* Some (not all, but some) ASIC design is actually done without HDLs. I don't know if Intel still does it. AMD stopped at one point and the results were tragic.

* There are, seriously, realistically, very few actual obstacles that stem from HDLs. Physical obstacles -- not just the laws of physics, but the difficult technological processes -- are considerably bigger obstacles. And then there's market presure:

> Sections of the CPU are still typically designed in isolation in an ad-hoc way, using barely any measurements, and rarely on anything more than a few small kernels.

"Barely any measurements" is not an accurate description of the ASIC design process IMHO. Yes, simulation/model extraction tools are used a lot more than in other fields of electronics, but that's somewhat hard to avoid when a) you're designing for a barely tested technological process (because it's brand new!) and b) prototypes are kind of expensive.

> Excel is about the most statistically advanced tool used in this.

I haven't done CPU design but I've seen some IC design being done (and wrote tools for people who did it). Maybe it's true for some CPU manufacturers, but I doubt it. I'm pretty sure that it especially won't cut it for mixed-signal designs like microcontrollers.

> As long as design teams only focus on their one little portion of the chip, without any overarching goal of global optimization, we're leaving a ton of performance on the table.

I'm not sure what you understand by "global optimization" but there's a lot of research done on topics like power-efficient caches, which is at least a form of optimization of more than one trait.

> The software people have just been treating hardware as a fixed target they have no control over, trusting that it will keep improving.

This is, unfortunately, an accurate description. There has been a time when software people weren't treating hardware as such, due to lack of abstraction-level tools, and I'm quite sure not even Wirth wishes to go back in time to then.

> To optimize hardware parameters, you need to make lots of measurements of representative software workloads. But where do those come from, and how are they compiled? Compiler writers have the liberty to change the way code is compiled to optimize performance on a specific chip (even if this isn't done so much in practice).

This is actually done quite a lot in practice, even within a single CPU family, like on x86, although granted not as much as it could be done. However, optimizing hardware parameters is somewhat meaningless as long as you need a general-purpose CPU. You can only go so far when considering potential workloads -- but that is being done, too. IBM's POWER cores, for instance, still try to improve sequential performance, because many of them still run single-threaded batch jobs for much of their functioning time.

> To get an actually representative measurement of hardware, these compiler changes need to be taken into account. Ideally, you'd be able to tune parameters at all layers of the stack, and design software and hardware together as one entity.

This has actually been attempted since the first reconfigurable logic arrays emerged in the late 70s/ early 80s. Unfortunately, it's plagued by the problem that the infrastructure required to dynamically reconfigure logic circuits is a drag. That's actually what makes FPGAs so slow.

I don't think FPGAs are the end of it, but "almost zero progress" is many light-years away from being an accurate description of the progress that has been made since MMI's PAL.

> Backwards compatibility is another reason. New instructions get added regularly, but those are only for cases where big gains are achieved in important workloads.

It's also done because practical experience has shown that, barring extreme stuff like OISC or perverted register-starved architecture design, instruction sets literally have negligible gain in terms of performance. Given enough time on the market, compilers that efficiently exploit any architecture will be designed. I mean shit it worked for the 80386!

However, not adding new instructions != not adding new features, not since microcoded architectures appeared. Granted, manual optimization is (sometimes?) still needed for extended sets (e.g. SSE), but their applications are niched enough that those applications that can't make use of them based on compiler inference alone are bound to be hand-optimized anyway.

Edit: compulsory disclaimer. I didn't do industrial level IC design, but I used to do research directly related to it and write software for that. This does mean I never actually did anything significantly more complex than a couple of current mirrors and logic circuits, but I did work with people who did. However, my experience is more on the analog IC side. I'm only casually familiar with logic IC design, so I may be unintentionally bullshitting you :-).

zwegner · on April 28, 2014

Hey, thanks for the critiques, a lot of them I don't really have a good answer for. A lot of these issues are not fully clear in my head (I haven't really CPU design either besides very basic college level VHDL stuff), and I'm not familiar with all of the current (or even past) research. A lot of my opinions come from my somewhat nebulous experience in the industry, and a lot of that is just what I've picked up from talking with people that would know (and that I'd rather not get too into on a public forum). There's definitely some hand-waving in there too, trying to be provocative and idealist :)

I'm mostly a software/compiler guy, where I know there can be a whole lot of improvement (especially in mainstream tools, not just toy/research projects), but my impression is that hardware isn't much better off. You've given me some good pointers for further topics to research, though. My email is in my profile if you want to chat more about this stuff!

weland · on April 29, 2014

> My email is in my profile if you want to chat more about this stuff!

Coming up :-). Cheers!

sliverstorm · on April 28, 2014

you're designing for a barely tested technological process (because it's brand new!)

Not even- at early stages of the project, you're designing for a technological process that is purely theoretical!

Well, we estimate that when we figure out how to make these transistors a year or two from now, they will be about X fast...

weland · on April 29, 2014

Indeed! High-end CPU design is often literally done for a process that does not yet exist, which makes measurement rather difficult :-).

RachelF · on April 28, 2014

Yes, there are hard technical problems, but there always have been, and these have been solved when the incentive was there.

There is also three economic reasons:

1. Fabs that make chips are much more expensive now.

2. Intel has no real competition, AMD limps along.

3. Most desktop users are happy with their computer speed, many still run XP

ufmace · on April 28, 2014

I'm curious if anyone here has any perspective on how close we are to absolute physical limits in CPU design. Last I heard, we're getting pretty close to dealing with quantum issues due to how small the transistor and connection size is getting, the frequency of light we need to do the etching, etc. I wonder if anybody knows how close we are to hitting hard limits in various categories. Surely, we'll hit some eventually, and I wonder what happens then.

marcosdumay · on April 28, 2014

Charge carriers tunneling through the insulating barrier of MOS-FETs are a problem for a few years already. The good news is that it reduces exponentialy with tension and insulator thikness, thus it can be worked around.

Dopped areas on silicon transistors have a minimum diameter around 10nm, that's the uncertainty about the position of charge carriers. Features a bit smaller than that will probably work, but much smaller ones won't.

That said, we are still far from the ultimate limits on CPU design. Those limits above apply to CMOS CPUs made of silicon what, despite describing all of the currently produced ones, is only one among many possibilities.

noir_lord · on April 28, 2014

> Surely, we'll hit some eventually, and I wonder what happens then.

Probably some radical break through using some basic physics principle some guys worked on 30 years ago and left in a library somewhere ;).

snarfy · on April 28, 2014

Grace Harper explains it best:

http://www.youtube.com/watch?v=JEpsKnWZrJ8

akuma73 · on April 28, 2014

The end of Dennard scaling is the root-cause. This is causing power density to stop scaling with smaller transistors and will ultimately be the end of Moore's law.

http://research.microsoft.com/en-us/events/fs2013/doug-burge...

tedsanders · on April 29, 2014

Hmm, I don't think so.

I'd say the end of Dennard scaling is why frequencies stopped going up around 2003.

What's killing Moore's law now is not Dennard's law ending, but the increasing price of lithography technologies. Right now we face the unpalatable choice between quadruple patterning (very expensive) and EUV (also very expensive). The industry is between a rock and hard place.

exelius · on April 28, 2014

IMO CPUs aren't getting any faster because we don't really need them to be much faster.

Now before the flames begin, let me caveat that as "We don't really need them to be much faster at single-threaded workloads." The article round-aboutly mentions this in the context of specialized CPUs, specifically GPUs: GPUs are basically hyper-concentrated thread runners. They're not very fast at running any single thread, but they have efficient shared memory and can run thousands of individual threads at once.

For larger workloads, we've gotten a lot more efficient through cloud computing. An individual CPU in the cloud is really not any faster than it was 5 years ago; but the advances made in energy efficiency (aka heat) and miniaturization mean you can fit a lot more of them in a smaller space.

While the technical hurdles to going faster are very real, I think we've built a technical infrastructure that's just not as reliant on the performance of any single piece of the system as it used to be. Therefore there is less demand for faster CPUs, when for many of the traditional "hard" computational workloads, more CPUs works almost as well and is a lot easier to scale than faster CPUs.

on April 28, 2014

[deleted]

georgeecollins · on April 28, 2014

Your i7s 2.5 years apart get a ~25% improvement in benchmark ratings.

A PC with an 80286 was like 4x faster than a PC with an 8088, and they were about 3 years apart (1979-> 1982 chip introductions, the computers were later for both). And not just 4 times faster in some tests, an 80286 was much faster at everything in a way any consumer could see.

Only programmer is going to buy a new computer for a 25% speed increase. The pace of improvement is slowing.

exelius · on April 28, 2014

I was more referring to the top of the line server CPUs (i.e. the fastest possible.) In laptops and mobile, they make a balance between power draw and CPU speed. Many of the advances in the last few years have not been on raw speed, but rather on how to draw less power.

In many ways, this somewhat proves my original point: raw CPU speed is not as important in a laptop setup as portability and battery life. The products that CPUs are being used in are optimizing for battery life and heat dissipation, and using the fastest CPU available that fits within those criteria. And at the end of the day, I really don't think you'd notice the CPU speed boost without a benchmark score. Is it nice to have? Sure, why not. But I know the MacBook Pro I had in 2007 was fast enough to do whatever I wanted; anymore I only upgrade to get a better screen...

shittyanalogy · on April 28, 2014

CPU power is a finite resource. To increase that resource in a usable way would certainly benefit "we".

exelius · on April 28, 2014

I agree; and we've been doing that. But the article was about "why aren't CPU's getting any faster?" CPUs aren't really getting much faster and haven't for a while; we've just got access to a lot more of them at a lower cost than ever, so the collective "we" still get the benefit of that.

philosophus · on April 28, 2014

I realize this article is from 2010, but it could have mentioned AMD, which does have a 5 GHz chip available now. It requires liquid cooling however.

zurn · on April 28, 2014

IBM beat that by a hair or two, 5 GHz POWER6 processors came out in in 2007. And the newer, faster POWER7 chips are clocked lower. Clock frequency != performance.

jokoon · on April 28, 2014

or "why it's more and more relevant to code with performance in mind, and consider minimalist designs"

josho · on April 28, 2014

In my experience a minimalist design often requires more effort behind the scenes. So, I don't see the two having any relationship.

Just one of many possible examples, the minimalist ios7 interface has an embedded physics engine so that all the springy/sliding views feel natural to the user.

jokoon · on April 28, 2014

I think this feature is not minimalist at all. Smooth scrolling consumes a lot of battery and processing power.

> In my experience a minimalist design often requires more effort behind the scenes.

Why ? When I say minimalist I essentially means less useless fancy features. Minimalist focuses on the useful, not on the feeling, so it can leave more room for other new features. The iPhone has a power of a 1995-2000 full fledged desktop PC or maybe more, yet it's using a lithium battery and it's 1cm thick.

This applies to all hardware: it's not worth it anymore to invest into new hardware if you're unable to exploit current hardware to their full potential.

You can throw money at chip technology, but at some point you also have to try throwing money at software and OS developers. Computers have always been about software, not hardware.

josho · on April 28, 2014

We have two different definitions of minimal here.

My take on a minimalist design means that life is made easier for the user. E.g. process data so the user has conclusions to interpret instead of raw data (that fits a minimalist design, but takes more processing).

I see now that your take on minimalist design is to remove unnecessary features from software. I'd argue that nobody wants "useless fancy features", unfortunately, what may be useless for me is necessary for you (otherwise why build those features anyway). So, it's all good to say let's keep to minimal designs (by whichever definition you mean), but in the end I don't see how that solves the problem that returns on hardware improvements is slowing, and frankly you haven't made the case that minimalist design solves the problem either.

jokoon · on April 29, 2014

> unfortunately, what may be useless for me is necessary for you

I don't understand that, I think I meant the opposite.

> and frankly you haven't made the case that minimalist design solves the problem either

You gave the example of the iPhone. The apple platform is one of the least flexible platform. In many places it requires to learn an unpopular language, objc, which is not used on existing, non-apple platform for many reasons, first one being that nextstep is owned by apple. Then it requires approval from the apple store.

John Carmack talked about those "layers of crap", they're present in most OS.

And what do you mean by "solving the problem" ? Having a minimalist designs allows you to leave more resources for other things, it doesn't "solve the problem", it just leaves room for more improvement.

> life is made easier for the user

So basically you're forcing everyone to walk on paths, and forbidding everyone to explore forests. I understand that it's better for the mainstream customers because they're not able to learn how to walk in forests, but flexibility is good too: you should still enable it.

In the end, when you buy hardware, most of the time you use the default feature set of the OS, nothing else. Web apps just use a sluggish, unreliable networking protocol which was designed to view static webpages.

So to sum up, people buy hardware to use facebook and twitter, listen to music, play a game, and that's what it is, a fancy, expensive gameboy color with chat. Except the hardware is 1000 or 100000 time faster and batteries last just as long.

I'm not making the case of solving a problem here, I'm just talking about the absence of improvement.

freehunter · on April 28, 2014

But there's a difference between a minimalist design and a design that looks minimalist. A minimalist design wouldn't need a physics engine because the things that need the physics engine are glitz effects. iOS 7 merely looks minimalist.

josho · on April 28, 2014

Yes, the parallax effect on the background image is glitz. I'd argue, however, that the physic's engine is not glitz, but rather makes the device easier to use, because when things work and feel as we expect it reduces the amount of mental effort we have to spend thinking about how to accomplish a task. E.g. when you can swipe a view to the side, and it 'feels' just like swiping a piece of paper to the side then you don't have to think about how to use the device, it all just makes sense. That's a smart design, minimalist in appearance, yet processor intensive.

jokoon · on April 29, 2014

maybe we could introduce the concept of "resource minimalism"