Taken for Granted

ESL, embedded processors, and more

“42″ is not the answer

Filed under: Uncategorized — November 22, 2008 @ 1:15 am

I was working on a presentation that I will be giving at VLSI 2009 in New Delhi in January of next year, and was consulting the 2007 ITRS (International Technology Roadmap for Semiconductors), in particular, the “System Drivers” chapter.   This had a couple of interesting charts showing the number of processing elements (PE’s) predicted over the next 15 years in Consumer Portable devices (such as mobile phones with extensive media capabilities, or digital cameras):

ITRS 2007 SoC Consumer Portable Design Complexity Trends

ITRS 2007 SoC Consumer Portable Design Complexity Trends

and the number of Data Processing Engines (DPE’s) in Consumer Stationary devices (such as high end game playing machines):

ITRS 2007 SoC Consumer Stationary Design Complexity Trends

ITRS 2007 SoC Consumer Stationary Design Complexity Trends

What is interesting is to look at the numbers of PEs and DPEs in the 2022 prediction in these two charts (you have to have really good eyesight or consult the original report:  1435 PEs for consumer portable (!) and 407 DPEs for consumer stationary (!)).  The consumer stationary also has 50 Main CPUs, under the assumption that one main CPU controls 8 DPEs.

The hockey sticks in both these graphs show a great increase in the number of processors in these two classes of devices.   We have seen an increase in the number of “cores” in multicore chips and “processors” in MPSoC chips in the last few years.   The ITRS predicts a profound increase in the numbers of cores and processors over the next 15 years.

This leads to the question (to paraphrase the rhetorical question “how many angels can dance on the head of a pin?”), “How many processor cores can actually be used on a chip?”

We know by practical experience that real chips with 40 processors (the Cisco QuantumFlow Processor), 80 processors (the Intel Teraflops research chip), 128 cores (the NVidia Tesla  C870) and 188 processors (the Cisco Silicon Packet Processor) exist today in real products or in real labs.  Therefore, the idea that process technology evolution along the lines of Moore’s Law will make it possible to put either 400 or 1400 processors on a chip, along with a whole host of memory, by 2022, does not seem outlandish.

But the big “multicore” and MPSoC question of the last couple of years has been – how on earth do we programme these devices?  What programming models, tools and methods will exist to let us cope with 1400 processors?    Will only the unembarrasingly parallel applications be able to take use of this SoC complexity?  Or can we find ways to make use of all this concurrent processing resource?

A chat today with my colleague Steve Leibson, a friend with considerable wisdom as evidenced by his blog Leibson’s Law , inverted the question for me.   Rather than worrying about how to programme a device with so many cores using today’s thinking, we can ask the questions “what kinds of new applications might  be enabled with this kind of computing resource?”, and also, “are there computational models impossible to implement effectively today that this kind of resource might enable?”.   For example, if we really had embedded chips with 1000 or 2000 processor cores on a single die, will the grad students of 2022 be able to come up with ways to finally implement effective speech recognition or brain wave interpretation?  Will neural network models be possible with this level of integration?  Indeed, “if we build it, will they come?”, the “they” being designers and students currently in elementary school who will be around when this kind of integration is possible and who will approach interesting problems in ways we cannot even begin to imagine.

By 2022, I might be retired – or at least, have slowed down a lot more!   But the intersection of bright minds and advanced technologies will give us capabilities hard to envisage today.   The title of this post is of course inspired by Douglas Adams.   Given the wise advice of the Hitchiker’s Guide, “Don’t Panic!”, one thing I can say is that the answer to the question “How many processor cores can actually be used on a chip?” is likely to be way more than “42″!

6 Comments »

  1. Patrick Madden:

    Must… keep… head… from… exploding!

    Seriously, though — these are the same charts they drew for Thinking Machines, the Inmos Transputer, and dozens of other flame-outs. This has been going on for more than 50 years, playing out the same way every time.

    Here’s a fun thing to try: go to NSF, and search for funded proposals with the word “parallel” in the title.
    This link should do the search.
    More than 1400 projects, the vast majority of which are computation-related. The earliest is from 1972. There are even earlier ones (and many more) if you look for things other than parallel in the title. Lots of funded projects are missing; the NSF records are a bit sketchy early on. And this is just NSF; plenty of stuff at DARPA, DOE, and elsewhere. I don’t believe we will find a solution if we just keep rehashing prior failures (which is, IMO, what’s happening).

    What have we found for all this effort across multiple decades? A handful of (un)embarrassingly parallel applications, where the architecture is usually a simple mesh, the parallelism is obvious, and the application itself can be coded easily in assembly, Fortran, or C.

    In some of my classes, I’ve been using Tinkerbell Engineering to describe the belief in parallel computing. If only enough people believe in magic, it will come true!

    I do agree with Steve that we need to look at fundamentally new kinds of applications (and these applications have some heavy-duty constraints if they’re going to be scalable). Believe it or not, that’s what I’m up to these days.

  2. Grant Martin:

    Patrick, interesting comment, but I think you missed my point about the year 2022. I am skeptical about our ability to exploit such massive concurrency in the next few years, except for the (few) unembarrassingly parallel applications that we have or will identify; but by punting the problem into the future and those grad students who will be around when I am almost 70 (!!), we can nicely limit our near-term debates to the few 10s of cores, and the conveniently concurrent applications, and design by composition using heterogeneous MPSoC, etc. The problem of preparing the next many waves of students and generations to the point where they might have new imaginative insights into such problems, is, thankfully, yours (and your colleagues) and not, thankfully, mine. And it is great to know that you are on the case!!!
    As for me, I’ll stick to the near term problems ….

  3. Karri Ranta-aho:

    Stumbled upon this post by accident, but can’t help voicing another interpretation to the graphs even though this may be very much off topic.

    Rather than thinking of how to program the 1000 processing engines, perhaps we should look at what the processing engines are set out to do.

    Just taking my cell phone. It has two cameras, both have their own dedicated DSP and a specific data processing engine to get the camera to the view finder. It has two displays, both having their respective drivers. It has 5 radios, (Bluetooth, WLAN, GPS, GSM and WCDMA/HSDPA), and I can only imagine the number of dedicated processing engines used to provide them. It has a number of external connections, all of them having dedicated hardware (USB, SIM, MMC, keyboard, …).

    You catch my drift. For each new specialized thing a device is designed to do there is easily a bunch of dedicated processing engines that are not multi-purpose, but very much application specific IP blocks or DSPs.

    This makes (a poor but still) analogy to the parallelism mentioned in the original article. As the tasks devices are designed to do are by nature very much parallel and specific-purpose, the conventional thinking of a computer program as a sequence of instructions a device needs to do is just wrong; the device is all the time doing a large number of naturally parallel things, but they have been serialized in the computer programming thinking of solving one big task rather than a ton of small tasks.

  4. Grant Martin:

    Karri

    Your comment is bang-on, and your notion of parallelism as a large number of concurrent small tasks is a very fine example of concurrency (concurrency is perhaps a better word for it than parallelism). The asymmetric concurrency you see in the cell phone, with multiple subsystems composed together, each subsystem using specialised application-specific processing, is the kind of “convenient concurrency” that my colleague Steve Leibson and I have been writing about. See Steve’s brief note on his blog: http://www.edn.com/blog/980000298/post/240020224.html??text=convenient+concurrency
    and the longer note we wrote on SCD Source about “convenient concurrency”, at http://www.scdsource.com/article.php?id=87
    (back in January of 2008).
    Thanks for the comment!

  5. » RAMS & Multi-Core Embedded Systems -- System Modeling Perspectives:

    [...] up with MANY homogeneous general purpose cores.  Grant Martin lists a few of these in his post “42″ is not the answer over on Taken for [...]

  6. Paul Stoaks:

    I think you’re right about the “next generation” having a very different viewpoint on this problem that allows them to leapfrog the old school. You know who’s going to get it first, don’t you? It’s going to be all those kids in Lego Robotics programs that are using National Instruments tools to do what is really, at the heart, parallel programming. I’m not suggesting that they’re really doing parallel programming now (there’s only one processor on the robot after all.) But the approach they’re taking (really visual programming using a type of DFD) inherently captures “convenient concurrency”. Whether the ultimate implementation is dedicated hardware, ASIP, or thread on a GPP core will be a design->implementation transformation problem. Concurrency issues are resolved at the model level.

    For larger, more complex multi-processor systems, it’s essential to do more high-level, simulation-based analysis to determine the correct platform design. (Simple trial-and-error at the model level won’t yield satisfactory results for these systems.) But these kids are being indoctrinated in an approach that is inherently parallel and somewhat platform indifferent, unlike you and I that were “brought up” on sequential software that was very aware of the underlying platform.

    You’ll notice that this hasn’t escaped NI. Check out their recent magazine ads…

    Merry Christmas!
    paul

    [As an aside, do you know what just struck me? OOA/OOD isn't really much help with this problem, is it? Not that it can't be applied, just that it doesn't really assist the designer in formulating the solution. Hmmm.]

RSS feed for comments on this post. TrackBack URI

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)