Archive for June, 2011

Jun 30 2011

ARM Mali GPU Unifying graphics across platforms

Published by under Uncategorized

ARM recently had an update announcement on their Mali GPU.  The RTL level core has now been licensed by 46 parties, of which 6 have released products and are royalty paying partners.  This new core is the engine in the Samsung Galaxy S2 phone.
The goal of the Mali program is to be able to deliver high performance graphics capabilities, at higher resolutions but with the same power budget.  The shifting application space is requiring the same user experience for phones, tablets, netbooks/laptops, standard display monitors and large screen. This range includes small sub 3″ displays at VGA through 60hz 4K2K displays. The performance of the displays has to compensate for the changes in memory bandwidth and the driver power for these external memories.  The GPU block is designed to be optimized for the ARM CPUs and minimize external memory calls which consume more power.  (Fig 1)
The multi-core design is set for large data and scaling with a DX7 style API and providing performance at levels that are DirectX11 compatible for desktops (5Gpixel/sec or 250GFLOPS)  and Open GL ES2.0 for mobile (1.5Gpixel/sec or 25GFLOPS).  In addition to these increasing data rates, to maintain image quality, there is more processing per pixel.  The same processing core must also handle use UI’s such as touch, gesture, multi-touch, 3D and other technologies that are both engaging and also provide a simplifying user experience;
The Mali-T604 uses the “Midgard” GPU architecture. (Figure 2).  The core is scalable up to 4 cores, and supports the full profile of OpenCL, OpenGL ES and Open VG as well as Microsoft DirectX up to V11.  The key for the GPU line (which is Android OS optimized) is to be brought to market after the 2006 acquisition of Falanx, as the graphics portion of the devices becomes more dominant.  The products are entering late into the market, and hoping to catch up on the coattails of their processor core dominance.  The core is directly in the marketplace competing against entrenched products form Nvidia, Imagination Technologies, Intel, Qualcomm, Marvell and others.
PC

ARM recently had an update announcement on their Mali GPU.  The RTL level core has now been licensed by 46 parties, of which 6 have released products and are royalty paying partners.  This new core is the engine in the Samsung Galaxy S2 phone.

The goal of the Mali program is to be able to deliver high performance graphics capabilities, at higher resolutions but with the same power budget.  The shifting application space is requiring the same user experience for phones, tablets, netbooks/laptops, standard display monitors and large screen. This range includes small sub 3″ displays at VGA through 60hz 4K2K displays. The performance of the displays has to compensate for the changes in memory bandwidth and the driver power for these external memories.  The GPU block is designed to be optimized for the ARM CPUs and minimize external memory calls which consume more power.  (Fig 1)

ARM Mali GPU and CPU Architecture

ARM Mali GPU and CPU Architecture

The multi-core design is set for large data and scaling with a DX7 style API and providing performance at levels that are DirectX11 compatible for desktops (5Gpixel/sec or 250GFLOPS)  and Open GL ES2.0 for mobile (1.5Gpixel/sec or 25GFLOPS).  In addition to these increasing data rates, to maintain image quality, there is more processing per pixel.  The same processing core must also handle use UI’s such as touch, gesture, multi-touch, 3D and other technologies that are both engaging and also provide a simplifying user experience.

ARM Mali T604 Overview

ARM Mali T604 Overview

The Mali-T604 uses the “Midgard” GPU architecture. (Figure 2).  The core is scalable up to 4 cores, and supports the full profile of OpenCL, OpenGL ES and Open VG as well as Microsoft DirectX up to V11.  The key for the GPU line (which is Android OS optimized) is to be brought to market after the 2006 acquisition of Falanx, as the graphics portion of the devices becomes more dominant.  The products are entering late into the market, and hoping to catch up on the coattails of their processor core dominance.  The core is directly in the marketplace competing against entrenched products form Nvidia, Imagination Technologies, Intel, Qualcomm, Marvell and others.

PC

No responses yet

Jun 30 2011

Nividia & Microsoft C+ + AMP update

Published by under Uncategorized

With the rise of multi-core systems and distributed computing, the performance optimization is also spreading to using both CPU and GPU as the compute engines in software.  The fundamental architectural distinction between the two are CPUs are sequential cores and GPUs are primarily parallel pipeline style cores optimized for computation.  Currently, such code is written in Fortran, Java, Python and with direct addressed functions from the GPU provider = C+ +.  Microsoft is dedicated to supporting Heterogeneous Parallel Computing across multi-core, distributed core, cloud cores and mixed CPU/GPGPU systems with a new standard extension to C+ + called C+ + AMP.  The C+ + AMP (accelerated massive parallelism) compiler should return the C+ + programming language to being, as claimed by Microsoft, the best performance/watt of any other development environment.
The AMP extension was chosen to be added to C+ + rather than C as it is a more mainstream language and the selection was to future proof the extension with the popularity of the language.  The extension is based on Lambda functions/objects and are documented as part of the C+ +Ox extensions for compute awareness.  The C+ + AMP extension is characterized by the addition of the Array_view and Restrict keywords.   The syntax for restrict(class) is to identify a parallelized  or serialized CPU.    For example restrict(direct3d) indicates execution by any DirectX11 GPU device.
To help simplify the tasks of memory mapping between multiple architectures, clearing and grouping the low level parallelization of the code, the optimally using the available cores, the C+ +AMP extensions have been included in Microsoft Visual Studio.    This allows for the full visual development and debug environment, including the suppress errors/messages, that has been the hallmark of the C+ + development.
Nvidia also showed new extensions to the CUDA primative language and the new release of Thrust v4.0.  Thrust is a CUDA library of parallel algorithms with an interface resembling the C+ + Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that enhances developer productivity   (http://code.google.com/p/thrust/) Thie new Thrust library allows for those using standard C+ + compilers (those not enabled with AMP extensions).  This open source high level library eliminates the details of determining the parallel granularity of blocks and threads as well as handling the getting data between the CPU memory and GPU memory and not either stranding data or operating on out of sync data.
PC

With the rise of multi-core systems and distributed computing, the performance optimization is also spreading to using both CPU and GPU as the compute engines in software.  The fundamental architectural distinction between the two are CPUs are sequential cores and GPUs are primarily parallel pipeline style cores optimized for computation.  Currently, such code is written in Fortran, Java, Python and with direct addressed functions from the GPU provider = C+ +.  Microsoft is dedicated to supporting Heterogeneous Parallel Computing across multi-core, distributed core, cloud cores and mixed CPU/GPGPU systems with a new standard extension to C+ + called C+ + AMP.  The C+ + AMP (accelerated massive parallelism) compiler should return the C+ + programming language to being, as claimed by Microsoft, the best performance/watt of any other development environment.

The AMP extension was chosen to be added to C+ + rather than C as it is a more mainstream language and the selection was to future proof the extension with the popularity of the language.  The extension is based on Lambda functions/objects and are documented as part of the C+ +Ox extensions for compute awareness.  The C+ + AMP extension is characterized by the addition of the Array_view and Restrict keywords.   The syntax for restrict(class) is to identify a parallelized  or serialized CPU.    For example restrict(direct3d) indicates execution by any DirectX11 GPU device.

To help simplify the tasks of memory mapping between multiple architectures, clearing and grouping the low level parallelization of the code, the optimally using the available cores, the C+ +AMP extensions have been included in Microsoft Visual Studio.    This allows for the full visual development and debug environment, including the suppress errors/messages, that has been the hallmark of the C+ + development.

Nvidia also showed new extensions to the CUDA primative language and the new release of Thrust v4.0.  Thrust is a CUDA library of parallel algorithms with an interface resembling the C+ + Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that enhances developer productivity   (http://code.google.com/p/thrust/) Thie new Thrust library allows for those using standard C+ + compilers (those not enabled with AMP extensions).  This open source high level library eliminates the details of determining the parallel granularity of blocks and threads as well as handling the getting data between the CPU memory and GPU memory and not either stranding data or operating on out of sync data.

PC

No responses yet