My Summer Vacation

post time 28. September 2010 member admin

Blogging has been something that slipped off my radar for the past year. Here’s a reboot to see if I can get this going again. I’ll start with where my brain has been the past 9 months. Once I have this out, the plan is to start blogging about my thoughts on industry happenings, needs, etc. around embedded and multicore.

My activities for 2010 have consisted primarily of two things 1) writing a new book and 2) sabbatical.

The new book is entitled “Break Away with Intel Atom Processors” and was co-authored with Lori Matassa. This book is primarily targeted at embedded software developers considering a project involving Intel Atom processors. For more information, please consult:

I’m really quite excited about this book. The theme of the book is on architecture migration. What was really critical for us in shaping the content was in trying to figure out what the average developer, familiar with other architectures, needs to know in order to move to Intel architecture. We needed to cover many different topics such as basic architecture, operating systems, performance optimization, boot loader selection, and Intel technology. A book could be written individually on any of these topics. We needed to find the right balance of technical detail, enough to get the reader comfortable while also providing pointers to the more detailed ‘textbooks’ on the individual subjects. I believe we’ve hit this balance well and hopeful that readers will agree.

Now, the really fun stuff – Every 7 years my company allows employees to take 8 weeks sabbatical. I lined up my sabbatical with my kids’ summer vacation. We spent a large chunk of July camping on the Oregon coast and central Oregon. In August we drove to California for a Disneyland visit followed by a 7 day Mexican Riviera cruise. Without a doubt, this was the best summer vacation I have ever had. I only had a few book items to work on during the summer. I found it fascinating that I could be on a ship in the Pacific ocean working. Laptop connected to Wifi connected to satellite internet – just 10 years ago this did not exist – amazing how fast technology moves.

Category Uncategorized | Comments Off

Multicore – assimilating the new developments

post time 9. October 2009 member admin

There was a brief pause in my blogging as I have been assimilating some of the new developments in the multicore realm.  Some of the items are close to home for me at my respective company with the acquisition of Wind River, Cilk Arts, and RapidMind.  I’m not at liberty to comment on strategy, future plans, or anything along those lines right now and would refer you to my respected colleague, James Reinders and his blog:

My 2+ 2 thoughts are along different lines.  I recently noticed a couple of sites reposting articles that I drafted several years back – one article was 3 years old and the other 2 years.  On the plus side, it’s quite flattering that articles I’ve drafted are still considered relevant after this time.  On the negative, the question is – if they are still relevant, is that because the ball hasn’t moved that far forward?  Have there been no breakthroughs made in the past 3 years that would advance the state of the art and obsolete my writing?

A couple of months back, I commented on Ramo’s book and tried to draw similarities with what’s happening in multicore software development.  I still believe that out of this clash of ideas will come a set of multicore programming models that provide increasing benefit to customers.  Call it round 2 if you will.

I’m interested to hear what people think.  How far has the ball been moved forward with regard to multicore software development in the past 3 years?

Category Uncategorized | Comments Off

Big endian + Little endian byte order = biendian

post time 29. July 2009 member admin

Today, I wanted to share a bit on a project I’ve worked on for a couple of years now as we’ve had some recent success. 

 One fairly subtle issue in migrating code from one architecture to another is byte order.  For a more detailed summary of the issue, I encourage you to read Matassa’s paper on the subject:  Experienced developers know that it is very difficult to find and fix byte order issues in large code bases. 

The project is based upon the following patent:

A more layperson’s description:

Biendian Technology is a compiler feature that enables an application to execute with big endian semantics when executed on little endian processors or vice versa.  At the lowest level, the technology takes advantage of the bswap instruction which converts 16-bit, 32-bit, and 64-bit values from one byte order to the other.  The programmer modifies the original source code to designate the byte order of data in the application using command line options, pragmas, and attributes.  During compilation and where required, bswap instructions are inserted at the boundary between memory and registers.  In other words, if a region of code is designated to execute with big endian semantics, the data is stored in big endian format.  When a load of a value into a register occurs in preparation for an operation on the value, a bswap instruction reorders the value into little endian format.  If the resulting computation is then stored back to memory, a bswap instruction reorders the value into big endian format before the store. 

That’s really it for today.  I just wanted to provide a bit of detail on one of my day jobs at my employer that keeps me excited working on technology to help customers take advantage of multicore processors.

Category Uncategorized | Comments Off

Multicore in the Age of the Unthinkable

post time 18. June 2009 member admin

Recently, I had the opportunity to read and finish in one weekend, ‘The Age of the Unthinkable’, by Joshua Cooper Ramo.  My attempt at a quick summary:

In the past, world affairs was driven by nation-states and the smaller number of players at this level of granularity mixed with less communication (both frequency and volume) made it possible to strategize, control, and influence this system.  However today, with the volume of change and amount of communication between orders of magnitude more people, this sort of nation-state actor strategy is insufficient and leads to unpredictable and oftentimes the opposite of expected results, e.g. actions to counter terrorism leads to an increase in terrorism.  Ramo posits that to counteract the negative forces in the world requires a strategy that is immune system-like in its response, a creative and multi-pronged approach.


As a computer scientist, I understood the concept of a world too complicated to predict outcomes.  Most computer scientists are exposed to Conway’s Game of Life (’s_Game_of_Life) very early in our educations.  Another, more esoteric example is trying to predict when a neural network being trained by backpropagation will all of a sudden reach equilibrium.  I recall in both cases looking at these types of programs and just thinking, ‘Wow.  Amazing.’ – Simple actions by a large number of entities creating surprising results.


Now turning this to my work in multicore – I gave a talk last year ( where I referred to a thought of my colleague Tim Mattson who compared what the industry is doing in multicore with the Draeger Grocer Store Experiment.  The conjecture is that perhaps we are pursuing too many ‘solutions’ to the problems of multicore software development and as a result are confusing customers.


Putting this all together – I’d like to suggest that all of these potential solutions are needed and inevitable.  Over time, the successful techniques will emerge; customers will move toward and employ the best ones.  Now it would not be fun to be on the side of one of these losing technologies so the question is how can one help encourage their particular technology to win.  This is where a multi-pronged strategy and tactical approach is needed.  Here are some thoughts on some items that are typically thought of as lower priorities compared to the obvious goodness of your particular tech:

  • Ease-of-use – There is a natural tradeoff between ease-of-use and performance.  The level of tradeoff is determined by the programmer’s preference and ability.  That said, bugs, poor documentation, and poor diagnostics are things that can make a technology harder to use than it really should be.
  • Easy to understand – If it’s difficult to explain why your particular technology should be used over another, you’ve got a problem.  It may be the case that only a handful of techies at a customer company understands the details, however being able to distill the pros/cons of a technology is critical.  For example, in 30 seconds, why would you use OpenCL over OpenMP or Pthreads?
  • Education – Customers need outlets to gain knowledge on a particular technology – typically, the more venues available for this learning, be it onsite classes, books, webinars, blogs, etc. … the better.  Psychologically, querying the web and seeing numerous links available with this type of information can be reassuring.
  • Open Standards – Customers frankly like choice.  Open standards tend to foster more choice with regards to implementations.  Is your solution proprietary?  If so, should you consider standardization?   


Best regards,




As an aside, I’d like to recommend a book from a colleague of mine.  Clay Breshears has recently published The Art of Concurrency.  I’ve taught classes on multicore with Clay.  In fact, he was the author of much of the content from which we taught.  I’ve read some of the book and can say that it reads well, explains concepts clearly, and in the end makes understanding concurrency and multicore programming easier.  Congratulations Clay on a well written book.

Category Uncategorized | Comments Off

Post-ESC Impressions

post time 13. April 2009 member admin

I had a good trip to Embedded Systems Conference, spending several hours on the expo floor, giving two talks in the general conference, and chatting with a bunch of colleagues.  Here’s a rundown on what I found noteworthy:

  • Keynote by T.K. Mattingly – I could listen to a former astronaut during the golden age of NASA for hours on end so I enjoyed this thoroughly.  One of the things he said that resonated with me was something he had heard from a launchpad engineer (not quoting quite right): “this is not going to fail because of me.”  Takeaway: in big engineering projects, you may not understand everything that is going on, but you should know your role and do it well, simple as that.
  • Expo floor – Show seemed smaller by about 30% this year.  Adjoining hall that was packed with vendors in years past was unused.  Floor traffic on Tuesday seemed light, Wednesday much better.  NI’s booth is always amazing.  Visual acquisition, signal processing, and robot control using a trendy game for inspiration.
  • With regard to multicore, CriticalBlue’s Prism tool was a standout.  The ability to perform ‘what if’ modelling of expected performance gain on different parallelization scenarios is very compelling.
  • My talks: Gave two “Debug Tools, Technologies & Techniques in the Multi-core Era” and “Case Studies in Software Optimization of Multi-core SMP”.  Both talks were decently attended, about 20 per, which was a big question mark going in with the financial crisis and all.

Regarding my talks, here’s the gist of what they are about and what I’d consider to be the compelling portions of each:

  • Debug Tools … – Survey of software development tools & techniques for debug of multi-core.  What is compelling: I suspect several in the audience learned something either about a new tool or a new technique to which they had not been exposed previously.  There is no silver bullet, one tool solution to multicore debug.  Instead you need to apply a number of techniques and tools to the problem.
  • Case Studies … – Review of the Threading development process and application of it to two real world applications.  What is compelling: attendee sees steps in the process making sense as it guides what is done in the optimization of the application.  For example, the initial performance analysis helps you learn the application and feeds into what portion you should focus on for optimization.  It may seem obvious, but seeing it on a real application, not a toy program is quite nice.

Best regards, Max

Category Uncategorized | Comments Off

MPP: On Documenting Best Known Methods for Multicore

post time 12. March 2009 member admin

I don’t mind the travel restrictions imposed by the economy because I’ve been fortunate to not have to travel until now.  Next week, I’m taking a day trip to Santa Clara to attend the Multicore Association Board Meeting and to discuss status on the working group I co-chair, Multicore Programming Practices.

The group, comprised of technical leaders from a variety of embedded software companies have been iterating on an outline of the document for the last 4 months.  The outline which weighs in at ~30 pages and 7 chapters is structured after your typical software development projects, e.g. analysis, design, debug, and performance tune.  The team has now split up to tackle the writing of 3 of the chapters, those focused on 1) an overview of available technology, 2) analysis and high level design, and 3) performance tuning. 

The challenge the group has is in trying to sufficiently explain the material detailed in the outline while staying in line with the targeted page budget.  Very early on, David and I wanted a document that was more than a whitepaper, but much less than a book, so ~100 pages felt about right.  What this means is that we’ll be trying to distill the need to know information into for example about 20 pages for the analysis and high level design chapter.  The team will obviously reference backing material where necessary, but I suspect the highly technical engineers on this project will want to explain topics in minute detail and will be challenged to be brief. 

I’m looking forward to reporting our progress at the board meeting on 3/16 and also seeing the results of the initial writing that will be completed this month.  



Category Uncategorized | Comments Off

Embedded multicore interview followup

post time 20. February 2009 member admin

Had a nice interview and mention in an article by John Blyler.

With regard to the issue identified in the story and video (coordination of process, thread, and vector level parallelism), I think there is a positive and a negative aspect for embedded developers.  The negative aspect for embedded developers is that much of the work for addressing this issue is being done in the desktop & server space.  The balance between OpenMP and automatic vectorization mentioned in the article is available in a compiler for desktop and server.  OpenMP is not supported on traditional embedded OSes to my knowledge.

The positive aspect is that embedded developers typically have more control over the other applications that may be executing on the system.  For example, you may have one process that is taking full advantage of the number of cores on the system, however if there is another application that is doing the same, you will typically end up with non-optimal performance.  On a desktop system, a developer doesn’t typically know the other applications that a customer may choose to execute.

An embedded systems developer would have better knowledge of other apps on the system and would be in a better position to do something about it.

Just thought I’d share some further thoughts on the interview.



Category Uncategorized | Comments Off

Embedded Multicore Debug

post time 9. February 2009 member admin

Early February finds me working on talks and papers for Embedded Systems Conference Silicon Valley.  I really do enjoy attending the conference every year and they really do treat their speakers well.  I’ve had the opportunity to hear keynotes from Al Gore and Dean Kamen (Segway inventor).  This year’s keynote is from Ken Mattingly of Apollo 13 fame.  How cool is that!

Of course, being a speaker at the conference involves real work, putting together a talk that will be appreciated by embedded developers with different experiences, backgrounds, interests, etc.

Over the past two weeks, I’ve been working on the first of my two presentations, a talk on embedded multicore debug.  This topic is very broad and I’m not an expert in all of the areas so I’m learning a great deal as I put together this talk.

What I’ve found is that multicore debug is not a solved problem.  It is very difficult and the technology for helping is somewhat early in maturation.  A positive spin – there are lots of opportunity for innovation in this area.  My talk will cover technology such as static analysis, simulators, thread verification tools, and hardware assisted tracing.  It’s a beginner level talk so can’t go too deep in each of these areas.

Anyone care to disagree with my statement – multicore debug is not a solved problem?

Best, Max

Category Uncategorized | Comments Off

5 months in multi-core – what has changed?

post time 27. October 2008 member admin

This week found me in Zurich, Switzerland, delivering a talk to researchers. The purpose of my talk and the other talks at this symposium was to share what different companies and researchers are doing to help “solve” the multi-core programming software challenge.

My content was similar to a talk I delivered in Japan earlier this year. The topic of my blog will be a reflection on what, if anything with regard to multi-core has changed in 5 months.

Just for grins, here’s the abstract of the talk:

The state of the art for optimizing and programming for parallelism on multi-core processors is evolving with many programming models being offered as the possible “solution” that software developers should use. Some would argue that there are perhaps too many such solutions being considered and some consolidation should occur. This talk shows the multicore programming technologies both currently available and being evaluated in the Intel® C++ Compiler. We’ll look at some different parallelism methods, such as software transactional memory, OpenMP 3.0, array notations and offer insight into what is guiding development of each.

Probably the most interesting change for my talk in the last five months has been the announcement of the Intel Parallel Studio. The toolkit is comprised of four different tools: Intel Parallel Advisor, Intel Parallel Composer, Intel Parallel Inspector, and Intel Parallel Amplifier.

Of course I work at Intel so I know a few more details on these tools, but cannot share them at this point. However, on the surface, I’m very excited by the Advisor tool which aims to “Gain insight on where parallelism will benefit existing source code.” I believe this is a key area of the multi-core development cycle that has relatively little tools support today. In addition, the tool targets developers who cannot necessarily throw their current implementation away and redesign. This particular theme lines up with my motivation and work with David Stewart on the Multicore Programming Practices working group which targets existing and legacy applications.

I believe other software vendors are developing or are soon to make available tools with similar capabilities. This is good news. The availability of this type of tool can only help programming for multi-core so I’m excited to test drive them as soon as they are available.

On a personal note: Zurich is a beautiful city. One cool portion of my trip was attending Sunday service at the Fraumunster Kirche (building with the large clock tower in the background). Amazing.

Category Uncategorized | Comments Off

Thoughts on the endstate of multicore software development

post time 19. September 2008 member admin

I recently read two interesting articles on multi-core programming from different angles. One is from a noted compiler writer who is a creator of development tools that enable multicore/multiprocessor programming:

The second article is an interview with a noted game programmer, who would be a consumer of such development tools:

Both of these are good reads. Both of the articles make arguments about the need for easier parallel programming. Dr. Wolfe posits about what is realistic from the point of view of a creator of tools. Mr. Sweeney discusses what customers need and essentially argues that the development tools need to do the lion’s share of the work.

I believe that for multicore processors to significantly impact the industry (impact means most customers derive benefit, which means most developers take advantage of parallelism in one form or another), the end state for parallel programming is that it will blend into the background and in a sense be taken for granted by developers.

In addition, I think it follows that any mass-market product that has a parallelism-centric purpose is a sign that we are not at that end state yet.

Take for example, Intel Threading Building Blocks and the recently announced Inte® Parallel Studio. Intel TBB is a great library and I think Intel Parallel Studio will help customers tremendously. But the question is … does the average developer care enough about multicore processors to take on these parallelism-centric tools?

I suspect it may be a bit too much for the average developer, but I do think tools like these will broaden the developer base taking advantage of parallelism and perhaps that is the best that can be asked.

Putting it all together – perhaps the end state is one where:

  • the average developer (apprentice) doesn’t have to care about parallel programming to take advantage of multicore – they derive benefit from domain specific parallel libraries
  • the experienced developer (journeyperson) increases the performance of their application by using 1st class language support for concurrency and is able to take good advantage of features in parallelism-centric tools where needed.
  • The expert (architect) designs-in use of key concurrency features in their application and is able to wield these parallelism-centric tools with ease.



Category Uncategorized | Comments Off
 « Next  

Domeika’s Dilemma is powered by WordPress
Theme is Coded&Designed by ricdes dot com